Search | arXiv e-print repository

Can AI Read Between The Lines? Benchmarking LLMs On Financial Nuance

Authors: Dominick Kubica, Dylan T. Gordon, Nanami Emura, Derleen Saini, Charlie Goldenberg

Abstract: As of 2025, Generative Artificial Intelligence (GenAI) has become a central tool for productivity across industries. Beyond text generation, GenAI now plays a critical role in coding, data analysis, and research workflows. As large language models (LLMs) continue to evolve, it is essential to assess the reliability and accuracy of their outputs, especially in specialized, high-stakes domains like… ▽ More As of 2025, Generative Artificial Intelligence (GenAI) has become a central tool for productivity across industries. Beyond text generation, GenAI now plays a critical role in coding, data analysis, and research workflows. As large language models (LLMs) continue to evolve, it is essential to assess the reliability and accuracy of their outputs, especially in specialized, high-stakes domains like finance. Most modern LLMs transform text into numerical vectors, which are used in operations such as cosine similarity searches to generate responses. However, this abstraction process can lead to misinterpretation of emotional tone, particularly in nuanced financial contexts. While LLMs generally excel at identifying sentiment in everyday language, these models often struggle with the nuanced, strategically ambiguous language found in earnings call transcripts. Financial disclosures frequently embed sentiment in hedged statements, forward-looking language, and industry-specific jargon, making it difficult even for human analysts to interpret consistently, let alone AI models. This paper presents findings from the Santa Clara Microsoft Practicum Project, led by Professor Charlie Goldenberg, which benchmarks the performance of Microsoft's Copilot, OpenAI's ChatGPT, Google's Gemini, and traditional machine learning models for sentiment analysis of financial text. Using Microsoft earnings call transcripts, the analysis assesses how well LLM-derived sentiment correlates with market sentiment and stock movements and evaluates the accuracy of model outputs. Prompt engineering techniques are also examined to improve sentiment analysis results. Visualizations of sentiment consistency are developed to evaluate alignment between tone and stock performance, with sentiment trends analyzed across Microsoft's lines of business to determine which segments exert the greatest influence. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 6 pages, 4 figures. Research conducted as part of a Microsoft-sponsored Capstone Project at Santa Clara University

ACM Class: I.2.6; I.2.7

arXiv:2503.24296 [pdf, other]

Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning

Authors: Yubo Zhang, Pedro Botelho, Trevor Gordon, Gil Zussman, Igor Kadota

Abstract: We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collisi… ▽ More We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collision), having no prior knowledge of the network size or of the transmission strategy of other sources. The goal of each source is to maximize their own throughput while striving for network-wide fairness. We propose a novel fully decentralized Reinforcement Learning (RL)-based solution that achieves fairness without coordination. The proposed Fair Share RL (FSRL) solution combines: (i) state augmentation with a semi-adaptive time reference; (ii) an architecture that leverages risk control and time difference likelihood; and (iii) a fairness-driven reward structure. We evaluate FSRL in more than 50 network settings with different number of agents, different amounts of available spectrum, in the presence of jammers, and in an ad-hoc setting. Simulation results suggest that, when we compare FSRL with a common baseline RL algorithm from the literature, FSRL can be up to 89.0% fairer (as measured by Jain's fairness index) in stringent settings with several sources and a single frequency band, and 48.1% fairer on average. △ Less

Submitted 31 March, 2025; originally announced March 2025.

Comments: To appear in WiOpt 2025

arXiv:2412.16720 [pdf, other]

OpenAI o1 System Card

Authors: OpenAI, :, Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Aleksander Madry, Alex Beutel, Alex Carney, Alex Iftimie, Alex Karpenko, Alex Tachard Passos, Alexander Neitz, Alexander Prokofiev, Alexander Wei, Allison Tam, Ally Bennett, Ananya Kumar, Andre Saraiva, Andrea Vallone, Andrew Duberstein, Andrew Kondrich , et al. (238 additional authors not shown)

Abstract: The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-ar… ▽ More The o1 model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment. This leads to state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence. Our results underscore the need for building robust alignment methods, extensively stress-testing their efficacy, and maintaining meticulous risk management protocols. This report outlines the safety work carried out for the OpenAI o1 and OpenAI o1-mini models, including safety evaluations, external red teaming, and Preparedness Framework evaluations. △ Less

Submitted 21 December, 2024; originally announced December 2024.

arXiv:2007.08501 [pdf, other]

Accelerating 3D Deep Learning with PyTorch3D

Authors: Nikhila Ravi, Jeremy Reizenstein, David Novotny, Taylor Gordon, Wan-Yen Lo, Justin Johnson, Georgia Gkioxari

Abstract: Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges invol… ▽ More Deep learning has significantly improved 2D image recognition. Extending into 3D may advance many new applications including autonomous vehicles, virtual and augmented reality, authoring 3D content, and even improving 2D recognition. However despite growing interest, 3D deep learning remains relatively underexplored. We believe that some of this disparity is due to the engineering challenges involved in 3D deep learning, such as efficiently processing heterogeneous data and reframing graphics operations to be differentiable. We address these challenges by introducing PyTorch3D, a library of modular, efficient, and differentiable operators for 3D deep learning. It includes a fast, modular differentiable renderer for meshes and point clouds, enabling analysis-by-synthesis approaches. Compared with other differentiable renderers, PyTorch3D is more modular and efficient, allowing users to more easily extend it while also gracefully scaling to large meshes and images. We compare the PyTorch3D operators and renderer with other implementations and demonstrate significant speed and memory improvements. We also use PyTorch3D to improve the state-of-the-art for unsupervised 3D mesh and point cloud prediction from 2D images on ShapeNet. PyTorch3D is open-source and we hope it will help accelerate research in 3D deep learning. △ Less

Submitted 16 July, 2020; originally announced July 2020.

Comments: tech report

arXiv:1406.3860 [pdf, other]

The Minimum Bends in a Polyline Drawing with Fixed Vertex Locations

Authors: Taylor Gordon

Abstract: We consider embeddings of planar graphs in $R^2$ where vertices map to points and edges map to polylines. We refer to such an embedding as a polyline drawing, and ask how few bends are required to form such a drawing for an arbitrary planar graph. It has long been known that even when the vertex locations are completely fixed, a planar graph admits a polyline drawing where edges bend a total of… ▽ More We consider embeddings of planar graphs in $R^2$ where vertices map to points and edges map to polylines. We refer to such an embedding as a polyline drawing, and ask how few bends are required to form such a drawing for an arbitrary planar graph. It has long been known that even when the vertex locations are completely fixed, a planar graph admits a polyline drawing where edges bend a total of $O(n^2)$ times. Our results show that this number of bends is optimal. In particular, we show that $Ω(n^2)$ total bends is required to form a polyline drawing on any set of fixed vertex locations for almost all planar graphs. This result generalizes all previously known lower bounds, which only applied to convex point sets, and settles 2 open problems. △ Less

Submitted 15 June, 2014; originally announced June 2014.

Comments: 12 pages, 5 figures

arXiv:1206.0514 [pdf, other]

Simultaneous Embeddings with Vertices Mapping to Pre-Specified Points

Authors: Taylor Gordon

Abstract: We discuss the problem of embedding graphs in the plane with restrictions on the vertex mapping. In particular, we introduce a technique for drawing planar graphs with a fixed vertex mapping that bounds the number of times edges bend. An immediate consequence of this technique is that any planar graph can be drawn with a fixed vertex mapping so that edges map to piecewise linear curves with at mos… ▽ More We discuss the problem of embedding graphs in the plane with restrictions on the vertex mapping. In particular, we introduce a technique for drawing planar graphs with a fixed vertex mapping that bounds the number of times edges bend. An immediate consequence of this technique is that any planar graph can be drawn with a fixed vertex mapping so that edges map to piecewise linear curves with at most $3n + O(1)$ bends each. By considering uniformly random planar graphs, we show that $2n + O(1)$ bends per edge is sufficient on average. To further utilize our technique, we consider simultaneous embeddings of $k$ uniformly random planar graphs with vertices mapping to a fixed, common point set. We explain how to achieve such a drawing so that edges map to piecewise linear curves with $O(n^{1-1/k})$ bends each, which holds with overwhelming probability. This result improves upon the previously best known result of O(n) bends per edge for the case where $k \geq 2$. Moreover, we give a lower bound on the number of bends that matches our upper bound, proving our results are optimal. △ Less

Submitted 4 June, 2012; originally announced June 2012.

Comments: 12 pages (plus appendix), 7 figures

Showing 1–6 of 6 results for author: Gordon, T