Search | arXiv e-print repository

BloombergGPT: A Large Language Model for Finance

Authors: Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, Gideon Mann

Abstract: The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion pa… ▽ More The use of NLP in the realm of financial technology is broad and complex, with applications ranging from sentiment analysis and named entity recognition to question answering. Large Language Models (LLMs) have been shown to be effective on a variety of tasks; however, no LLM specialized for the financial domain has been reported in literature. In this work, we present BloombergGPT, a 50 billion parameter language model that is trained on a wide range of financial data. We construct a 363 billion token dataset based on Bloomberg's extensive data sources, perhaps the largest domain-specific dataset yet, augmented with 345 billion tokens from general purpose datasets. We validate BloombergGPT on standard LLM benchmarks, open financial benchmarks, and a suite of internal benchmarks that most accurately reflect our intended usage. Our mixed dataset training leads to a model that outperforms existing models on financial tasks by significant margins without sacrificing performance on general LLM benchmarks. Additionally, we explain our modeling choices, training process, and evaluation methodology. We release Training Chronicles (Appendix C) detailing our experience in training BloombergGPT. △ Less

Submitted 21 December, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Comments: Updated to include Training Chronicles (Appendix C)

arXiv:1805.02393 [pdf, other]

doi 10.1145/3209978.3210031

Weakly-supervised Contextualization of Knowledge Graph Facts

Authors: Nikos Voskarides, Edgar Meij, Ridho Reinanda, Abhinav Khaitan, Miles Osborne, Giorgio Stefanoni, Prabhanjan Kambadur, Maarten de Rijke

Abstract: Knowledge graphs (KGs) model facts about the world, they consist of nodes (entities such as companies and people) that are connected by edges (relations such as founderOf). Facts encoded in KGs are frequently used by search applications to augment result pages. When presenting a KG fact to the user, providing other facts that are pertinent to that main fact can enrich the user experience and suppo… ▽ More Knowledge graphs (KGs) model facts about the world, they consist of nodes (entities such as companies and people) that are connected by edges (relations such as founderOf). Facts encoded in KGs are frequently used by search applications to augment result pages. When presenting a KG fact to the user, providing other facts that are pertinent to that main fact can enrich the user experience and support exploratory information needs. KG fact contextualization is the task of augmenting a given KG fact with additional and useful KG facts. The task is challenging because of the large size of KGs, discovering other relevant facts even in a small neighborhood of the given fact results in an enormous amount of candidates. We introduce a neural fact contextualization method (NFCM) to address the KG fact contextualization task. NFCM first generates a set of candidate facts in the neighborhood of a given fact and then ranks the candidate facts using a supervised learning to rank model. The ranking model combines features that we automatically learn from data and that represent the query-candidate facts with a set of hand-crafted features we devised or adjusted for this task. In order to obtain the annotations required to train the learning to rank model at scale, we generate training data automatically using distant supervision on a large entity-tagged text corpus. We show that ranking functions learned on this data are effective at contextualizing KG facts. Evaluation using human assessors shows that it significantly outperforms several competitive baselines. △ Less

Submitted 8 July, 2018; v1 submitted 7 May, 2018; originally announced May 2018.

Comments: SIGIR 2018: 41st international ACM SIGIR conference on Research and Development in Information Retrieval. July version: corrected typos

arXiv:1703.03389 [pdf, other]

Faster Greedy MAP Inference for Determinantal Point Processes

Authors: Insu Han, Prabhanjan Kambadur, Kyoungsoo Park, Jinwoo Shin

Abstract: Determinantal point processes (DPPs) are popular probabilistic models that arise in many machine learning tasks, where distributions of diverse sets are characterized by matrix determinants. In this paper, we develop fast algorithms to find the most likely configuration (MAP) of large-scale DPPs, which is NP-hard in general. Due to the submodular nature of the MAP objective, greedy algorithms have… ▽ More Determinantal point processes (DPPs) are popular probabilistic models that arise in many machine learning tasks, where distributions of diverse sets are characterized by matrix determinants. In this paper, we develop fast algorithms to find the most likely configuration (MAP) of large-scale DPPs, which is NP-hard in general. Due to the submodular nature of the MAP objective, greedy algorithms have been used with empirical success. Greedy implementations require computation of log-determinants, matrix inverses or solving linear systems at each iteration. We present faster implementations of the greedy algorithms by utilizing the complementary benefits of two log-determinant approximation schemes: (a) first-order expansions to the matrix log-determinant function and (b) high-order expansions to the scalar log function with stochastic trace estimators. In our experiments, our algorithms are orders of magnitude faster than their competitors, while sacrificing marginal accuracy. △ Less

Submitted 13 June, 2017; v1 submitted 9 March, 2017; originally announced March 2017.

arXiv:1606.01530 [pdf, other]

Adaptive Submodular Ranking and Routing

Authors: Fatemeh Navidi, Prabhanjan Kambadur, Viswanath Nagarajan

Abstract: We study a general stochastic ranking problem where an algorithm needs to adaptively select a sequence of elements so as to "cover" a random scenario (drawn from a known distribution) at minimum expected cost. The coverage of each scenario is captured by an individual submodular function, where the scenario is said to be covered when its function value goes above a given threshold. We obtain a log… ▽ More We study a general stochastic ranking problem where an algorithm needs to adaptively select a sequence of elements so as to "cover" a random scenario (drawn from a known distribution) at minimum expected cost. The coverage of each scenario is captured by an individual submodular function, where the scenario is said to be covered when its function value goes above a given threshold. We obtain a logarithmic factor approximation algorithm for this adaptive ranking problem, which is the best possible (unless P=NP). This problem unifies and generalizes many previously studied problems with applications in search ranking and active learning. The approximation ratio of our algorithm either matches or improves the best result known in each of these special cases. Furthermore, we extend our results to an adaptive vehicle routing problem, where costs are determined by an underlying metric. This routing problem is a significant generalization of the previously-studied adaptive traveling salesman and traveling repairman problems. Our approximation ratio nearly matches the best bound known for these special cases. Finally, we present experimental results for some applications of adaptive ranking. △ Less

Submitted 5 February, 2019; v1 submitted 5 June, 2016; originally announced June 2016.

arXiv:1503.00374 [pdf, other]

A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix

Authors: Christos Boutsidis, Petros Drineas, Prabhanjan Kambadur, Eugenia-Maria Kontopoulou, Anastasios Zouzias

Abstract: We introduce a novel algorithm for approximating the logarithm of the determinant of a symmetric positive definite (SPD) matrix. The algorithm is randomized and approximates the traces of a small number of matrix powers of a specially constructed matrix, using the method of Avron and Toledo~\cite{AT11}. From a theoretical perspective, we present additive and relative error bounds for our algorithm… ▽ More We introduce a novel algorithm for approximating the logarithm of the determinant of a symmetric positive definite (SPD) matrix. The algorithm is randomized and approximates the traces of a small number of matrix powers of a specially constructed matrix, using the method of Avron and Toledo~\cite{AT11}. From a theoretical perspective, we present additive and relative error bounds for our algorithm. Our additive error bound works for any SPD matrix, whereas our relative error bound works for SPD matrices whose eigenvalues lie in the interval $(θ_1,1)$, with $0<θ_1<1$; the latter setting was proposed in~\cite{icml2015_hana15}. From an empirical perspective, we demonstrate that a C++ implementation of our algorithm can approximate the logarithm of the determinant of large matrices very accurately in a matter of seconds. △ Less

Submitted 31 August, 2016; v1 submitted 1 March, 2015; originally announced March 2015.

Comments: working paper

arXiv:1404.2910 [pdf, other]

Statistical Tests for Large Tree-structured Data

Authors: Karthik Bharath, Prabhanjan Kambadur, Dipak. K. Dey, Arvind Rao, Veerabhadran Baladandayuthapani

Abstract: We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of… ▽ More We develop a general statistical framework for the analysis and inference of large tree-structured data, with a focus on developing asymptotic goodness-of-fit tests. We first propose a consistent statistical model for binary trees, from which we develop a class of invariant tests. Using the model for binary trees, we then construct tests for general trees by using the distributional properties of the Continuum Random Tree, which arises as the invariant limit for a broad class of models for tree-structured data based on conditioned Galton--Watson processes. The test statistics for the goodness-of-fit tests are simple to compute and are asymptotically distributed as $χ^2$ and $F$ random variables. We illustrate our methods on an important application of detecting tumour heterogeneity in brain cancer. We use a novel approach with tree-based representations of magnetic resonance images and employ the developed tests to ascertain tumor heterogeneity between two groups of patients. △ Less

Submitted 20 September, 2016; v1 submitted 10 April, 2014; originally announced April 2014.

arXiv:1311.2854 [pdf, other]

Spectral Clustering via the Power Method -- Provably

Authors: Christos Boutsidis, Alex Gittens, Prabhanjan Kambadur

Abstract: Spectral clustering is one of the most important algorithms in data mining and machine intelligence; however, its computational complexity limits its application to truly large scale data analysis. The computational bottleneck in spectral clustering is computing a few of the top eigenvectors of the (normalized) Laplacian matrix corresponding to the graph representing the data to be clustered. One… ▽ More Spectral clustering is one of the most important algorithms in data mining and machine intelligence; however, its computational complexity limits its application to truly large scale data analysis. The computational bottleneck in spectral clustering is computing a few of the top eigenvectors of the (normalized) Laplacian matrix corresponding to the graph representing the data to be clustered. One way to speed up the computation of these eigenvectors is to use the "power method" from the numerical linear algebra literature. Although the power method has been empirically used to speed up spectral clustering, the theory behind this approach, to the best of our knowledge, remains unexplored. This paper provides the \emph{first} such rigorous theoretical justification, arguing that a small number of power iterations suffices to obtain near-optimal partitionings using the approximate eigenvectors. Specifically, we prove that solving the $k$-means clustering problem on the approximate eigenvectors obtained via the power method gives an additive-error approximation to solving the $k$-means problem on the optimal eigenvectors. △ Less

Submitted 12 May, 2015; v1 submitted 12 November, 2013; originally announced November 2013.

Comments: ICML 2015, to appear

arXiv:1211.1658 [pdf, ps, other]

Extending Task Parallelism for Frequent Pattern Mining

Authors: Prabhanjan Kambadur, Amol Ghoting, Anshul Gupta, Andrew Lumsdaine

Abstract: Algorithms for frequent pattern mining, a popular informatics application, have unique requirements that are not met by any of the existing parallel tools. In particular, such applications operate on extremely large data sets and have irregular memory access patterns. For efficient parallelization of such applications, it is necessary to support dynamic load balancing along with scheduling mechani… ▽ More Algorithms for frequent pattern mining, a popular informatics application, have unique requirements that are not met by any of the existing parallel tools. In particular, such applications operate on extremely large data sets and have irregular memory access patterns. For efficient parallelization of such applications, it is necessary to support dynamic load balancing along with scheduling mechanisms that allow users to exploit data locality. Given these requirements, task parallelism is the most promising of the available parallel programming models. However, existing solutions for task parallelism schedule tasks implicitly and hence, custom scheduling policies that can exploit data locality cannot be easily employed. In this paper we demonstrate and characterize the speedup obtained in a frequent pattern mining application using a custom clustered scheduling policy in place of the popular Cilk-style policy. We present PFunc, a novel task parallel library whose customizable task scheduling and task priorities facilitated the implementation of our clustered scheduling policy. △ Less

Submitted 7 November, 2012; originally announced November 2012.

arXiv:1210.1190 [pdf, ps, other]

Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization

Authors: Abhishek Kumar, Vikas Sindhwani, Prabhanjan Kambadur

Abstract: The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012) turns non-negative matrix factorization (NMF) into a tractable problem. Recently, a new class of provably-correct NMF algorithms have emerged under this assumption. In this paper, we reformulate the separable NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors. From this geometric pe… ▽ More The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012) turns non-negative matrix factorization (NMF) into a tractable problem. Recently, a new class of provably-correct NMF algorithms have emerged under this assumption. In this paper, we reformulate the separable NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors. From this geometric perspective, we derive new separable NMF algorithms that are highly scalable and empirically noise robust, and have several other favorable properties in relation to existing methods. A parallel implementation of our algorithm demonstrates high scalability on shared- and distributed-memory machines. △ Less

Submitted 3 October, 2012; originally announced October 2012.

Comments: 15 pages, 6 figures

Showing 1–9 of 9 results for author: Kambadur, P