Skip to main content

Showing 1–18 of 18 results for author: Mehta, D

Searching in archive q-fin. Search in all archives.
.
  1. arXiv:2502.01495  [pdf, other

    q-fin.ST q-fin.CP q-fin.RM q-fin.TR stat.ML

    Supervised Similarity for High-Yield Corporate Bonds with Quantum Cognition Machine Learning

    Authors: Joshua Rosaler, Luca Candelori, Vahagn Kirakosyan, Kharen Musaelian, Ryan Samson, Martin T. Wells, Dhagash Mehta, Stefano Pasquali

    Abstract: We investigate the application of quantum cognition machine learning (QCML), a novel paradigm for both supervised and unsupervised learning tasks rooted in the mathematical formalism of quantum theory, to distance metric learning in corporate bond markets. Compared to equities, corporate bonds are relatively illiquid and both trade and quote data in these securities are relatively sparse. Thus, a… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  2. arXiv:2412.15298  [pdf, ps, other

    cs.CL cs.AI cs.LG q-fin.ST stat.ME

    A Comparative Study of DSPy Teleprompter Algorithms for Aligning Large Language Models Evaluation Metrics to Human Evaluation

    Authors: Bhaskarjit Sarmah, Kriti Dutta, Anna Grigoryan, Sachin Tiwari, Stefano Pasquali, Dhagash Mehta

    Abstract: We argue that the Declarative Self-improving Python (DSPy) optimizers are a way to align the large language model (LLM) prompts and their evaluations to the human annotations. We present a comparative analysis of five teleprompter algorithms, namely, Cooperative Prompt Optimization (COPRO), Multi-Stage Instruction Prompt Optimization (MIPRO), BootstrapFewShot, BootstrapFewShot with Optuna, and K-N… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 7 pages, 10 tables, two-column format

  3. arXiv:2412.12148  [pdf, other

    stat.ML cs.CL cs.LG q-fin.ST stat.AP

    How to Choose a Threshold for an Evaluation Metric for Large Language Models

    Authors: Bhaskarjit Sarmah, Mingshu Li, Jingrao Lyu, Sebastian Frank, Nathalia Castellanos, Stefano Pasquali, Dhagash Mehta

    Abstract: To ensure and monitor large language models (LLMs) reliably, various evaluation metrics have been proposed in the literature. However, there is little research on prescribing a methodology to identify a robust threshold on these metrics even though there are many serious implications of an incorrect choice of the thresholds during deployment of the LLMs. Translating the traditional model risk mana… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: 16 pages, 8 figures, 4 tables. 2-columns

  4. arXiv:2408.10340  [pdf, other

    stat.ML cs.LG q-fin.ST stat.AP

    Can an unsupervised clustering algorithm reproduce a categorization system?

    Authors: Nathalia Castellanos, Dhruv Desai, Sebastian Frank, Stefano Pasquali, Dhagash Mehta

    Abstract: Peer analysis is a critical component of investment management, often relying on expert-provided categorization systems. These systems' consistency is questioned when they do not align with cohorts from unsupervised clustering algorithms optimized for various metrics. We investigate whether unsupervised clustering can reproduce ground truth classes in a labeled dataset, showing that success depend… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 tables 28 figures

  5. arXiv:2408.06679  [pdf, other

    cs.LG q-fin.ST stat.ML

    Case-based Explainability for Random Forest: Prototypes, Critics, Counter-factuals and Semi-factuals

    Authors: Gregory Yampolsky, Dhruv Desai, Mingshu Li, Stefano Pasquali, Dhagash Mehta

    Abstract: The explainability of black-box machine learning algorithms, commonly known as Explainable Artificial Intelligence (XAI), has become crucial for financial and other regulated industrial applications due to regulatory requirements and the need for transparency in business practices. Among the various paradigms of XAI, Explainable Case-Based Reasoning (XCBR) stands out as a pragmatic approach that e… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures, 5 tables

  6. arXiv:2408.04948  [pdf, other

    cs.CL cs.LG q-fin.ST stat.AP stat.ML

    HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction

    Authors: Bhaskarjit Sarmah, Benika Hall, Rohan Rao, Sunil Patel, Stefano Pasquali, Dhagash Mehta

    Abstract: Extraction and interpretation of intricate information from unstructured text data arising in financial applications, such as earnings call transcripts, present substantial challenges to large language models (LLMs) even using the current best practices to use Retrieval Augmented Generation (RAG) (referred to as VectorRAG techniques which utilize vector databases for information retrieval) due to… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 9 pages, 2 figures, 5 tables

  7. arXiv:2408.02355  [pdf, other

    stat.ML cs.LG q-fin.ST q-fin.TR

    Quantile Regression using Random Forest Proximities

    Authors: Mingshu Li, Bhaskarjit Sarmah, Dhruv Desai, Joshua Rosaler, Snigdha Bhagat, Philip Sommer, Dhagash Mehta

    Abstract: Due to the dynamic nature of financial markets, maintaining models that produce precise predictions over time is difficult. Often the goal isn't just point prediction but determining uncertainty. Quantifying uncertainty, especially the aleatoric uncertainty due to the unpredictable nature of market drivers, helps investors understand varying risk levels. Recently, quantile regression forests (QRF)… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figures, 3 tables

  8. arXiv:2408.02273  [pdf, other

    q-fin.ST q-fin.TR stat.AP

    Machine Learning-based Relative Valuation of Municipal Bonds

    Authors: Preetha Saha, Jingrao Lyu, Dhruv Desai, Rishab Chauhan, Jerinsh Jeyapaulraj, Philip Sommer, Dhagash Mehta

    Abstract: The trading ecosystem of the Municipal (muni) bond is complex and unique. With nearly 2\% of securities from over a million securities outstanding trading daily, determining the value or relative value of a bond among its peers is challenging. Traditionally, relative value calculation has been done using rule-based or heuristics-driven approaches, which may introduce human biases and often fail to… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 tables, 8 figures

  9. arXiv:2310.12428  [pdf, other

    stat.ML cs.AI cs.LG q-fin.ST stat.ME

    Enhanced Local Explainability and Trust Scores with Random Forest Proximities

    Authors: Joshua Rosaler, Dhruv Desai, Bhaskarjit Sarmah, Dimitrios Vamvourellis, Deran Onay, Dhagash Mehta, Stefano Pasquali

    Abstract: We initiate a novel approach to explain the predictions and out of sample performance of random forest (RF) regression and classification models by exploiting the fact that any RF can be mathematically formulated as an adaptive weighted K nearest-neighbors model. Specifically, we employ a recent result that, for both regression and classification tasks, any RF prediction can be rewritten exactly a… ▽ More

    Submitted 5 August, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: 5 pages, 6 figures

  10. arXiv:2310.10760  [pdf, other

    cs.CL q-fin.PM q-fin.ST stat.AP

    Towards reducing hallucination in extracting information from financial reports using Large Language Models

    Authors: Bhaskarjit Sarmah, Tianjie Zhu, Dhagash Mehta, Stefano Pasquali

    Abstract: For a financial analyst, the question and answer (Q\&A) segment of the company financial report is a crucial piece of information for various analysis and investment decisions. However, extracting valuable insights from the Q\&A section has posed considerable challenges as the conventional methods such as detailed reading and note-taking lack scalability and are susceptible to human errors, and Op… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: 4 pages + references. Accepted for publication in Workshop on Generative AI at the 3rd International Conference on AI-ML Systems 2023, Bengaluru, India

  11. arXiv:2308.08031  [pdf, other

    q-fin.ST q-fin.CP stat.AP

    Company Similarity using Large Language Models

    Authors: Dimitrios Vamvourellis, Máté Toth, Snigdha Bhagat, Dhruv Desai, Dhagash Mehta, Stefano Pasquali

    Abstract: Identifying companies with similar profiles is a core task in finance with a wide range of applications in portfolio construction, asset pricing and risk attribution. When a rigorous definition of similarity is lacking, financial analysts usually resort to 'traditional' industry classifications such as Global Industry Classification System (GICS) which assign a unique category to each company at d… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: 8 pages, 2 figures, 2 tables

  12. arXiv:2308.06882  [pdf, other

    q-fin.ST cs.LG q-fin.CP stat.AP

    Quantifying Outlierness of Funds from their Categories using Supervised Similarity

    Authors: Dhruv Desai, Ashmita Dhiman, Tushar Sharma, Deepika Sharma, Dhagash Mehta, Stefano Pasquali

    Abstract: Mutual fund categorization has become a standard tool for the investment management industry and is extensively used by allocators for portfolio construction and manager selection, as well as by fund managers for peer analysis and competitive positioning. As a result, a (unintended) miscategorization or lack of precision can significantly impact allocation decisions and investment fund managers. H… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    Comments: 8 pages, 5 tables, 8 figures

  13. arXiv:2207.07183  [pdf, other

    q-fin.CP q-fin.ST stat.AP

    Learning Embedded Representation of the Stock Correlation Matrix using Graph Machine Learning

    Authors: Bhaskarjit Sarmah, Nayana Nair, Dhagash Mehta, Stefano Pasquali

    Abstract: Understanding non-linear relationships among financial instruments has various applications in investment processes ranging from risk management, portfolio construction and trading strategies. Here, we focus on interconnectedness among stocks based on their correlation matrix which we represent as a network with the nodes representing individual stocks and the weighted links between pairs of nodes… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: 8 pages, 2 column format, 3 figure, 7 tables

  14. arXiv:2207.04959  [pdf, other

    q-fin.CP q-fin.ST stat.ML

    Learning Mutual Fund Categorization using Natural Language Processing

    Authors: Dimitrios Vamvourellis, Mate Attila Toth, Dhruv Desai, Dhagash Mehta, Stefano Pasquali

    Abstract: Categorization of mutual funds or Exchange-Traded-funds (ETFs) have long served the financial analysts to perform peer analysis for various purposes starting from competitor analysis, to quantifying portfolio diversification. The categorization methodology usually relies on fund composition data in the structured format extracted from the Form N-1A. Here, we initiate a study to learn the categoriz… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: 8 pages, 5 figures, 2-column format

  15. arXiv:2207.04368  [pdf, other

    q-fin.CP q-fin.ST q-fin.TR

    Supervised similarity learning for corporate bonds using Random Forest proximities

    Authors: Jerinsh Jeyapaulraj, Dhruv Desai, Peter Chu, Dhagash Mehta, Stefano Pasquali, Philip Sommer

    Abstract: Financial literature consists of ample research on similarity and comparison of financial assets and securities such as stocks, bonds, mutual funds, etc. However, going beyond correlations or aggregate statistics has been arduous since financial datasets are noisy, lack useful features, have missing data and often lack ground truth or annotated labels. However, though similarity extrapolated from… ▽ More

    Submitted 25 October, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

    Comments: A few minor typos corrected, 1 figure added. Conclusions unchanged. Matching with the accepted version

  16. arXiv:2107.05592  [pdf, other

    q-fin.ST q-fin.CP stat.AP

    Investor Behavior Modeling by Analyzing Financial Advisor Notes: A Machine Learning Perspective

    Authors: Cynthia Pagliaro, Dhagash Mehta, Han-Tai Shiao, Shaofei Wang, Luwei Xiong

    Abstract: Modeling investor behavior is crucial to identifying behavioral coaching opportunities for financial advisors. With the help of natural language processing (NLP) we analyze an unstructured (textual) dataset of financial advisors' summary notes, taken after every investor conversation, to gain first ever insights into advisor-investor interactions. These insights are used to predict investor needs… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

    Comments: 8 pages, 2 column format, 7 figures+5 tables

  17. arXiv:2106.12987  [pdf, other

    q-fin.ST cs.LG q-fin.CP stat.AP

    Fund2Vec: Mutual Funds Similarity using Graph Learning

    Authors: Vipul Satone, Dhruv Desai, Dhagash Mehta

    Abstract: Identifying similar mutual funds with respect to the underlying portfolios has found many applications in financial services ranging from fund recommender systems, competitors analysis, portfolio analytics, marketing and sales, etc. The traditional methods are either qualitative, and hence prone to biases and often not reproducible, or, are known not to capture all the nuances (non-linearities) am… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: 2 column format, 8 pages, 8 figures, 5 tables

  18. arXiv:2006.00123  [pdf, other

    q-fin.ST cs.LG q-fin.CP stat.ML

    Machine Learning Fund Categorizations

    Authors: Dhagash Mehta, Dhruv Desai, Jithin Pradeep

    Abstract: Given the surge in popularity of mutual funds (including exchange-traded funds (ETFs)) as a diversified financial investment, a vast variety of mutual funds from various investment management firms and diversification strategies have become available in the market. Identifying similar mutual funds among such a wide landscape of mutual funds has become more important than ever because of many appli… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: 8 pages, 2-column format, 5 figures