-
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading
Authors:
Guojun Xiong,
Zhiyang Deng,
Keyi Wang,
Yupeng Cao,
Haohang Li,
Yangyang Yu,
Xueqing Peng,
Mingquan Lin,
Kaleb E Smith,
Xiao-Yang Liu,
Jimin Huang,
Sophia Ananiadou,
Qianqian Xie
Abstract:
Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{FLAG-Trader}, a unif…
▽ More
Large language models (LLMs) fine-tuned on multimodal financial data have demonstrated impressive reasoning capabilities in various financial tasks. However, they often struggle with multi-step, goal-oriented scenarios in interactive financial markets, such as trading, where complex agentic approaches are required to improve decision-making. To address this, we propose \textsc{FLAG-Trader}, a unified architecture integrating linguistic processing (via LLMs) with gradient-driven reinforcement learning (RL) policy optimization, in which a partially fine-tuned LLM acts as the policy network, leveraging pre-trained knowledge while adapting to the financial domain through parameter-efficient fine-tuning. Through policy gradient optimization driven by trading rewards, our framework not only enhances LLM performance in trading but also improves results on other financial-domain tasks. We present extensive empirical evidence to validate these enhancements.
△ Less
Submitted 18 February, 2025; v1 submitted 16 February, 2025;
originally announced February 2025.
-
INVESTORBENCH: A Benchmark for Financial Decision-Making Tasks with LLM-based Agent
Authors:
Haohang Li,
Yupeng Cao,
Yangyang Yu,
Shashidhar Reddy Javaji,
Zhiyang Deng,
Yueru He,
Yuechen Jiang,
Zining Zhu,
Koduvayur Subbalakshmi,
Guojun Xiong,
Jimin Huang,
Lingfei Qian,
Xueqing Peng,
Qianqian Xie,
Jordan W. Suchow
Abstract:
Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To…
▽ More
Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce \textsc{InvestorBench}, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. InvestorBench enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks, cryptocurrencies and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, multi-modal datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents' performance across various scenarios.
△ Less
Submitted 24 December, 2024;
originally announced December 2024.
-
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Authors:
Yuzhe Yang,
Yifei Zhang,
Yan Hu,
Yilin Guo,
Ruoli Gan,
Yueru He,
Mingcong Lei,
Xiao Zhang,
Haining Wang,
Qianqian Xie,
Jimin Huang,
Honghai Yu,
Benyou Wang
Abstract:
This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly…
▽ More
This paper introduces the UCFE: User-Centric Financial Expertise benchmark, an innovative framework designed to evaluate the ability of large language models (LLMs) to handle complex real-world financial tasks. UCFE benchmark adopts a hybrid approach that combines human expert evaluations with dynamic, task-specific interactions to simulate the complexities of evolving financial scenarios. Firstly, we conducted a user study involving 804 participants, collecting their feedback on financial tasks. Secondly, based on this feedback, we created our dataset that encompasses a wide range of user intents and interactions. This dataset serves as the foundation for benchmarking 11 LLMs services using the LLM-as-Judge methodology. Our results show a significant alignment between benchmark scores and human preferences, with a Pearson correlation coefficient of 0.78, confirming the effectiveness of the UCFE dataset and our evaluation approach. UCFE benchmark not only reveals the potential of LLMs in the financial domain but also provides a robust framework for assessing their performance and user satisfaction.
△ Less
Submitted 7 February, 2025; v1 submitted 17 October, 2024;
originally announced October 2024.
-
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Authors:
Jimin Huang,
Mengxi Xiao,
Dong Li,
Zihao Jiang,
Yuzhe Yang,
Yifei Zhang,
Lingfei Qian,
Yan Wang,
Xueqing Peng,
Yang Ren,
Ruoyu Xiang,
Zhengyu Chen,
Xiao Zhang,
Yueru He,
Weiguang Han,
Shunian Chen,
Lihang Shen,
Daniel Kim,
Yangyang Yu,
Yupeng Cao,
Zhiyang Deng,
Haohang Li,
Duanyu Feng,
Yongfu Dai,
VijayaSai Somasundaram
, et al. (19 additional authors not shown)
Abstract:
Financial LLMs hold promise for advancing financial tasks and domain-specific applications. However, they are limited by scarce corpora, weak multimodal capabilities, and narrow evaluations, making them less suited for real-world application. To address this, we introduce \textit{Open-FinLLMs}, the first open-source multimodal financial LLMs designed to handle diverse tasks across text, tabular, t…
▽ More
Financial LLMs hold promise for advancing financial tasks and domain-specific applications. However, they are limited by scarce corpora, weak multimodal capabilities, and narrow evaluations, making them less suited for real-world application. To address this, we introduce \textit{Open-FinLLMs}, the first open-source multimodal financial LLMs designed to handle diverse tasks across text, tabular, time-series, and chart data, excelling in zero-shot, few-shot, and fine-tuning settings. The suite includes FinLLaMA, pre-trained on a comprehensive 52-billion-token corpus; FinLLaMA-Instruct, fine-tuned with 573K financial instructions; and FinLLaVA, enhanced with 1.43M multimodal tuning pairs for strong cross-modal reasoning. We comprehensively evaluate Open-FinLLMs across 14 financial tasks, 30 datasets, and 4 multimodal tasks in zero-shot, few-shot, and supervised fine-tuning settings, introducing two new multimodal evaluation datasets. Our results show that Open-FinLLMs outperforms afvanced financial and general LLMs such as GPT-4, across financial NLP, decision-making, and multi-modal tasks, highlighting their potential to tackle real-world challenges. To foster innovation and collaboration across academia and industry, we release all codes (https://anonymous.4open.science/r/PIXIU2-0D70/B1D7/LICENSE) and models under OSI-approved licenses.
△ Less
Submitted 6 June, 2025; v1 submitted 20 August, 2024;
originally announced August 2024.
-
Predicting Failure of P2P Lending Platforms through Machine Learning: The Case in China
Authors:
Jen-Yin Yeh,
Hsin-Yu Chiu,
Jhih-Huei Huang
Abstract:
This study employs machine learning models to predict the failure of Peer-to-Peer (P2P) lending platforms, specifically in China. By employing the filter method and wrapper method with forward selection and backward elimination, we establish a rigorous and practical procedure that ensures the robustness and importance of variables in predicting platform failures. The research identifies a set of r…
▽ More
This study employs machine learning models to predict the failure of Peer-to-Peer (P2P) lending platforms, specifically in China. By employing the filter method and wrapper method with forward selection and backward elimination, we establish a rigorous and practical procedure that ensures the robustness and importance of variables in predicting platform failures. The research identifies a set of robust variables that consistently appear in the feature subsets across different selection methods and models, suggesting their reliability and relevance in predicting platform failures. The study highlights that reducing the number of variables in the feature subset leads to an increase in the false acceptance rate while the performance metrics remain stable, with an AUC value of approximately 0.96 and an F1 score of around 0.88. The findings of this research provide significant practical implications for regulatory authorities and investors operating in the Chinese P2P lending industry.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
The Wall Street Neophyte: A Zero-Shot Analysis of ChatGPT Over MultiModal Stock Movement Prediction Challenges
Authors:
Qianqian Xie,
Weiguang Han,
Yanzhao Lai,
Min Peng,
Jimin Huang
Abstract:
Recently, large language models (LLMs) like ChatGPT have demonstrated remarkable performance across a variety of natural language processing tasks. However, their effectiveness in the financial domain, specifically in predicting stock market movements, remains to be explored. In this paper, we conduct an extensive zero-shot analysis of ChatGPT's capabilities in multimodal stock movement prediction…
▽ More
Recently, large language models (LLMs) like ChatGPT have demonstrated remarkable performance across a variety of natural language processing tasks. However, their effectiveness in the financial domain, specifically in predicting stock market movements, remains to be explored. In this paper, we conduct an extensive zero-shot analysis of ChatGPT's capabilities in multimodal stock movement prediction, on three tweets and historical stock price datasets. Our findings indicate that ChatGPT is a "Wall Street Neophyte" with limited success in predicting stock movements, as it underperforms not only state-of-the-art methods but also traditional methods like linear regression using price features. Despite the potential of Chain-of-Thought prompting strategies and the inclusion of tweets, ChatGPT's performance remains subpar. Furthermore, we observe limitations in its explainability and stability, suggesting the need for more specialized training or fine-tuning. This research provides insights into ChatGPT's capabilities and serves as a foundation for future work aimed at improving financial market analysis and prediction by leveraging social media sentiment and historical stock data.
△ Less
Submitted 28 April, 2023; v1 submitted 10 April, 2023;
originally announced April 2023.
-
Mastering Pair Trading with Risk-Aware Recurrent Reinforcement Learning
Authors:
Weiguang Han,
Jimin Huang,
Qianqian Xie,
Boyi Zhang,
Yanzhao Lai,
Min Peng
Abstract:
Although pair trading is the simplest hedging strategy for an investor to eliminate market risk, it is still a great challenge for reinforcement learning (RL) methods to perform pair trading as human expertise. It requires RL methods to make thousands of correct actions that nevertheless have no obvious relations to the overall trading profit, and to reason over infinite states of the time-varying…
▽ More
Although pair trading is the simplest hedging strategy for an investor to eliminate market risk, it is still a great challenge for reinforcement learning (RL) methods to perform pair trading as human expertise. It requires RL methods to make thousands of correct actions that nevertheless have no obvious relations to the overall trading profit, and to reason over infinite states of the time-varying market most of which have never appeared in history. However, existing RL methods ignore the temporal connections between asset price movements and the risk of the performed trading. These lead to frequent tradings with high transaction costs and potential losses, which barely reach the human expertise level of trading. Therefore, we introduce CREDIT, a risk-aware agent capable of learning to exploit long-term trading opportunities in pair trading similar to a human expert. CREDIT is the first to apply bidirectional GRU along with the temporal attention mechanism to fully consider the temporal correlations embedded in the states, which allows CREDIT to capture long-term patterns of the price movements of two assets to earn higher profit. We also design the risk-aware reward inspired by the economic theory, that models both the profit and risk of the tradings during the trading period. It helps our agent to master pair trading with a robust trading preference that avoids risky trading with possible high returns and losses. Experiments show that it outperforms existing reinforcement learning methods in pair trading and achieves a significant profit over five years of U.S. stock data.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Company Competition Graph
Authors:
Yanci Zhang,
Yutong Lu,
Haitao Mao,
Jiawei Huang,
Cien Zhang,
Xinyi Li,
Rui Dai
Abstract:
Financial market participants frequently rely on numerous business relationships to make investment decisions. Investors can learn about potential risks and opportunities associated with other connected entities through these corporate connections. Nonetheless, human annotation of a large corpus to extract such relationships is highly time-consuming, not to mention that it requires a considerable…
▽ More
Financial market participants frequently rely on numerous business relationships to make investment decisions. Investors can learn about potential risks and opportunities associated with other connected entities through these corporate connections. Nonetheless, human annotation of a large corpus to extract such relationships is highly time-consuming, not to mention that it requires a considerable amount of industry expertise and professional training. Meanwhile, we have yet to observe means to generate reliable knowledge graphs of corporate relationships due to the lack of impartial and granular data sources. This study proposes a system to process financial reports and construct the public competitor graph to fill the void. Our method can retrieve more than 83\% competition relationship of the S\&P 500 index companies. Based on the output from our system, we construct a knowledge graph with more than 700 nodes and 1200 edges. A demo interactive graph interface is available.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Select and Trade: Towards Unified Pair Trading with Hierarchical Reinforcement Learning
Authors:
Weiguang Han,
Boyi Zhang,
Qianqian Xie,
Min Peng,
Yanzhao Lai,
Jimin Huang
Abstract:
Pair trading is one of the most effective statistical arbitrage strategies which seeks a neutral profit by hedging a pair of selected assets. Existing methods generally decompose the task into two separate steps: pair selection and trading. However, the decoupling of two closely related subtasks can block information propagation and lead to limited overall performance. For pair selection, ignoring…
▽ More
Pair trading is one of the most effective statistical arbitrage strategies which seeks a neutral profit by hedging a pair of selected assets. Existing methods generally decompose the task into two separate steps: pair selection and trading. However, the decoupling of two closely related subtasks can block information propagation and lead to limited overall performance. For pair selection, ignoring the trading performance results in the wrong assets being selected with irrelevant price movements, while the agent trained for trading can overfit to the selected assets without any historical information of other assets. To address it, in this paper, we propose a paradigm for automatic pair trading as a unified task rather than a two-step pipeline. We design a hierarchical reinforcement learning framework to jointly learn and optimize two subtasks. A high-level policy would select two assets from all possible combinations and a low-level policy would then perform a series of trading actions. Experimental results on real-world stock data demonstrate the effectiveness of our method on pair trading compared with both existing pair selection and trading methods.
△ Less
Submitted 5 February, 2023; v1 submitted 25 January, 2023;
originally announced January 2023.
-
Double-jump stochastic volatility model for VIX: evidence from VVIX
Authors:
Xin Zang,
Jun Ni,
Jing-Zhi Huang,
Lan Wu
Abstract:
The paper studies the continuous-time dynamics of VIX with stochastic volatility and jumps in VIX and volatility. Built on the general parametric affine model with stochastic volatility and jump in logarithm of VIX, we derive a linear relation between the stochastic volatility factor and VVIX index. We detect the existence of co-jump of VIX and VVIX and put forward a double-jump stochastic volatil…
▽ More
The paper studies the continuous-time dynamics of VIX with stochastic volatility and jumps in VIX and volatility. Built on the general parametric affine model with stochastic volatility and jump in logarithm of VIX, we derive a linear relation between the stochastic volatility factor and VVIX index. We detect the existence of co-jump of VIX and VVIX and put forward a double-jump stochastic volatility model for VIX through its joint property with VVIX. With VVIX index as a proxy for the stochastic volatility, we use MCMC method to estimate the dynamics of VIX. Comparing nested models on VIX, we show the jump in VIX and the volatility factor is statistically significant. The jump intensity is also statedependent. We analyze the impact of jump factor on the VIX dynamics.
△ Less
Submitted 1 July, 2015; v1 submitted 24 June, 2015;
originally announced June 2015.
-
Optimal dual martingales, their analysis and application to new algorithms for Bermudan products
Authors:
John Schoenmakers,
Junbo Huang,
Jianing Zhang
Abstract:
In this paper we introduce and study the concept of optimal and surely optimal dual martingales in the context of dual valuation of Bermudan options, and outline the development of new algorithms in this context. We provide a characterization theorem, a theorem which gives conditions for a martingale to be surely optimal, and a stability theorem concerning martingales which are near to be surely o…
▽ More
In this paper we introduce and study the concept of optimal and surely optimal dual martingales in the context of dual valuation of Bermudan options, and outline the development of new algorithms in this context. We provide a characterization theorem, a theorem which gives conditions for a martingale to be surely optimal, and a stability theorem concerning martingales which are near to be surely optimal in a sense. Guided by these results we develop a framework of backward algorithms for constructing such a martingale. In turn this martingale may then be utilized for computing an upper bound of the Bermudan product. The methodology is pure dual in the sense that it doesn't require certain input approximations to the Snell envelope. In an Itô-Lévy environment we outline a particular regression based backward algorithm which allows for computing dual upper bounds without nested Monte Carlo simulation. Moreover, as a by-product this algorithm also provides approximations to the continuation values of the product, which in turn determine a stopping policy. Hence, we may obtain lower bounds at the same time. In a first numerical study we demonstrate the backward dual regression algorithm in a Wiener environment at well known benchmark examples. It turns out that the method is at least comparable to the one in Belomestny et. al. (2009) regarding accuracy, but regarding computational robustness there are even several advantages.
△ Less
Submitted 13 February, 2012; v1 submitted 25 November, 2011;
originally announced November 2011.
-
Optimal dividend and investing control of a insurance company with higher solvency constraints
Authors:
Zongxia Liang,
Jianping Huang
Abstract:
This paper considers optimal control problem of a large insurance company under a fixed insolvency probability. The company controls proportional reinsurance rate, dividend pay-outs and investing process to maximize the expected present value of the dividend pay-outs until the time of bankruptcy. This paper aims at describing the optimal return function as well as the optimal policy. As a by-produ…
▽ More
This paper considers optimal control problem of a large insurance company under a fixed insolvency probability. The company controls proportional reinsurance rate, dividend pay-outs and investing process to maximize the expected present value of the dividend pay-outs until the time of bankruptcy. This paper aims at describing the optimal return function as well as the optimal policy. As a by-product, the paper theoretically sets a risk-based capital standard to ensure the capital requirement of can cover the total risk.
△ Less
Submitted 31 May, 2010; v1 submitted 8 May, 2010;
originally announced May 2010.