-
Representation Learning of Structured Data for Medical Foundation Models
Authors:
Vijay Prakash Dwivedi,
Viktor Schlegel,
Andy T. Liu,
Thanh-Tung Nguyen,
Abhinav Ramesh Kashyap,
Jeng Wei,
Wei-Hsian Yin,
Stefan Winkler,
Robby T. Tan
Abstract:
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing me…
▽ More
Large Language Models (LLMs) have demonstrated remarkable performance across various domains, including healthcare. However, their ability to effectively represent structured non-textual data, such as the alphanumeric medical codes used in records like ICD-10 or SNOMED-CT, is limited and has been particularly exposed in recent research. This paper examines the challenges LLMs face in processing medical codes due to the shortcomings of current tokenization methods. As a result, we introduce the UniStruct architecture to design a multimodal medical foundation model of unstructured text and structured data, which addresses these challenges by adapting subword tokenization techniques specifically for the structured medical codes. Our approach is validated through model pre-training on both an extensive internal medical database and a public repository of structured medical records. Trained on over 1 billion tokens on the internal medical database, the proposed model achieves up to a 23% improvement in evaluation metrics, with around 2% gain attributed to our proposed tokenization. Additionally, when evaluated on the EHRSHOT public benchmark with a 1/1000 fraction of the pre-training data, the UniStruct model improves performance on over 42% of the downstream tasks. Our approach not only enhances the representation and generalization capabilities of patient-centric models but also bridges a critical gap in representation learning models' ability to handle complex structured medical data, alongside unstructured text.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
MEDSAGE: Enhancing Robustness of Medical Dialogue Summarization to ASR Errors with LLM-generated Synthetic Dialogues
Authors:
Kuluhan Binici,
Abhinav Ramesh Kashyap,
Viktor Schlegel,
Andy T. Liu,
Vijay Prakash Dwivedi,
Thanh-Tung Nguyen,
Xiaoxue Gao,
Nancy F. Chen,
Stefan Winkler
Abstract:
Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solut…
▽ More
Automatic Speech Recognition (ASR) systems are pivotal in transcribing speech into text, yet the errors they introduce can significantly degrade the performance of downstream tasks like summarization. This issue is particularly pronounced in clinical dialogue summarization, a low-resource domain where supervised data for fine-tuning is scarce, necessitating the use of ASR models as black-box solutions. Employing conventional data augmentation for enhancing the noise robustness of summarization models is not feasible either due to the unavailability of sufficient medical dialogue audio recordings and corresponding ASR transcripts. To address this challenge, we propose MEDSAGE, an approach for generating synthetic samples for data augmentation using Large Language Models (LLMs). Specifically, we leverage the in-context learning capabilities of LLMs and instruct them to generate ASR-like errors based on a few available medical dialogue examples with audio recordings. Experimental results show that LLMs can effectively model ASR noise, and incorporating this noisy data into the training process significantly improves the robustness and accuracy of medical dialogue summarization systems. This approach addresses the challenges of noisy ASR outputs in critical applications, offering a robust solution to enhance the reliability of clinical dialogue summarization.
△ Less
Submitted 8 January, 2025; v1 submitted 26 August, 2024;
originally announced August 2024.
-
To Trade Or Not To Trade: Cascading Waterfall Round Robin Rebalancing Mechanism for Cryptocurrencies
Authors:
Ravi Kashyap
Abstract:
We have designed an innovative portfolio rebalancing mechanism termed the Cascading Waterfall Round Robin Mechanism. This algorithmic approach recommends an ideal size and number of trades for each asset during the periodic rebalancing process, factoring in the gas fee and slippage. The essence of the model we have created gives indications regarding whether trades should be made on individual ass…
▽ More
We have designed an innovative portfolio rebalancing mechanism termed the Cascading Waterfall Round Robin Mechanism. This algorithmic approach recommends an ideal size and number of trades for each asset during the periodic rebalancing process, factoring in the gas fee and slippage. The essence of the model we have created gives indications regarding whether trades should be made on individual assets depending on the uncertainty in the micro - asset level characteristics - and macro - aggregate market factors - environments. In the hyper-volatile crypto market, our approach to daily rebalancing will benefit from volatility. Price movements will cause our algorithm to buy assets that drop in prices and sell as they soar. In fact, the buying and selling happen only when certain boundaries are crossed in order to weed out any market noise and ensure sound trade execution. We have provided several numerical examples to illustrate the steps - including the calculation of several intermediate variables - of our rebalancing mechanism. The Algorithm we have developed can be easily applied outside blockchain to investment funds across all asset classes at any trading frequency and rebalancing duration.
Shakespeare As A Crypto Trader:
To Trade Or Not To Trade, that is the Question,
Whether an Optimizer can Yield the Answer,
Against the Spikes and Crashes of Markets Gone Wild,
To Quench One's Thirst before Liquidity Runs Dry,
Or Wait till the Tide of Momentum turns Mild.
△ Less
Submitted 17 May, 2024;
originally announced July 2024.
-
The Blockchain Risk Parity Line: Moving From The Efficient Frontier To The Final Frontier Of Investments
Authors:
Ravi Kashyap
Abstract:
We engineer blockchain based risk managed portfolios by creating three funds with distinct risk and return profiles: 1) Alpha - high risk portfolio; 2) Beta - mimics the wider market; and 3) Gamma - represents the risk free rate adjusted to beat inflation. Each of the sub-funds (Alpha, Beta and Gamma) provides risk parity because the weight of each asset in the corresponding portfolio is set to be…
▽ More
We engineer blockchain based risk managed portfolios by creating three funds with distinct risk and return profiles: 1) Alpha - high risk portfolio; 2) Beta - mimics the wider market; and 3) Gamma - represents the risk free rate adjusted to beat inflation. Each of the sub-funds (Alpha, Beta and Gamma) provides risk parity because the weight of each asset in the corresponding portfolio is set to be inversely proportional to the risk derived from investing in that asset. This can be equivalently stated as equal risk contributions from each asset towards the overall portfolio risk.
We provide detailed mechanics of combining assets - including mathematical formulations - to obtain better risk managed portfolios. The descriptions are intended to show how a risk parity based efficient frontier portfolio management engine - that caters to different risk appetites of investors by letting each individual investor select their preferred risk-return combination - can be created seamlessly on blockchain.
Any Investor - using decentralized ledger technology - can select their desired level of risk, or return, and allocate their wealth accordingly among the sub funds, which balance one another under different market conditions. This evolution of the risk parity principle - resulting in a mechanism that is geared to do well under all market cycles - brings more robust performance and can be termed as conceptual parity.
We have given several numerical examples that illustrate the various scenarios that arise when combining Alpha, Beta and Gamma to obtain Parity.
The final investment frontier is now possible - a modification to the efficient frontier, thus becoming more than a mere theoretical construct - on blockchain since anyone from anywhere can participate at anytime to obtain wealth appreciation based on their financial goals.
△ Less
Submitted 26 June, 2024;
originally announced July 2024.
-
M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering
Authors:
Anand Subramanian,
Viktor Schlegel,
Abhinav Ramesh Kashyap,
Thanh-Tung Nguyen,
Vijay Prakash Dwivedi,
Stefan Winkler
Abstract:
There is vivid research on adapting Large Language Models (LLMs) to perform a variety of tasks in high-stakes domains such as healthcare. Despite their popularity, there is a lack of understanding of the extent and contributing factors that allow LLMs to recall relevant knowledge and combine it with presented information in the clinical and biomedical domain: a fundamental pre-requisite for succes…
▽ More
There is vivid research on adapting Large Language Models (LLMs) to perform a variety of tasks in high-stakes domains such as healthcare. Despite their popularity, there is a lack of understanding of the extent and contributing factors that allow LLMs to recall relevant knowledge and combine it with presented information in the clinical and biomedical domain: a fundamental pre-requisite for success on down-stream tasks. Addressing this gap, we use Multiple Choice and Abstractive Question Answering to conduct a large-scale empirical study on 22 datasets in three generalist and three specialist biomedical sub-domains. Our multifaceted analysis of the performance of 15 LLMs, further broken down by sub-domain, source of knowledge and model architecture, uncovers success factors such as instruction tuning that lead to improved recall and comprehension. We further show that while recently proposed domain-adapted models may lack adequate knowledge, directly fine-tuning on our collected medical knowledge datasets shows encouraging results, even generalising to unseen specialist sub-domains. We complement the quantitative results with a skill-oriented manual error analysis, which reveals a significant gap between the models' capabilities to simply recall necessary knowledge and to integrate it with the presented context. To foster research and collaboration in this field we share M-QALM, our resources, standardised methodology, and evaluation results, with the research community to facilitate further advancements in clinical knowledge representation learning within language models.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
The Democratization of Wealth Management: Hedged Mutual Fund Blockchain Protocol
Authors:
Ravi Kashyap
Abstract:
We develop several innovations to bring the best practices of traditional investment funds to the blockchain landscape. Specifically, we illustrate how: 1) fund prices can be updated regularly like mutual funds; 2) performance fees can be charged like hedge funds; 3) mutually hedged blockchain investment funds can operate with investor protection schemes, such as high water marks; and 4) measures…
▽ More
We develop several innovations to bring the best practices of traditional investment funds to the blockchain landscape. Specifically, we illustrate how: 1) fund prices can be updated regularly like mutual funds; 2) performance fees can be charged like hedge funds; 3) mutually hedged blockchain investment funds can operate with investor protection schemes, such as high water marks; and 4) measures to offset trading related slippage costs when redemptions happen. Using our concepts - and blockchain technology - traditional funds can calculate performance fees in a simplified manner and alleviate several operational issues. Blockchain can solve many problems for traditional finance, while tried and tested wealth management techniques can benefit decentralization, speeding its adoption. We provide detailed steps - including mathematical formulations and instructive pointers - to implement these ideas and discuss how our designs overcome several blockchain bottlenecks, making smart contracts smarter. We provide numerical illustrations of several scenarios related to our mechanisms.
△ Less
Submitted 29 July, 2024; v1 submitted 12 March, 2024;
originally announced May 2024.
-
Automated Clinical Coding for Outpatient Departments
Authors:
Viktor Schlegel,
Abhinav Ramesh Kashyap,
Thanh-Tung Nguyen,
Tsung-Han Yang,
Vijay Prakash Dwivedi,
Wei-Hsian Yin,
Jeng Wei,
Stefan Winkler
Abstract:
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they pr…
▽ More
Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they present unique and distinct challenges, which raises the question of whether the success of inpatient clinical coding approaches translates to the outpatient setting. This paper is the first to investigate how well state-of-the-art deep learning-based clinical coding approaches work in the outpatient setting at hospital scale. To this end, we collect a large outpatient dataset comprising over 7 million notes documenting over half a million patients. We adapt four state-of-the-art clinical coding approaches to this setting and evaluate their potential to assist coders. We find evidence that clinical coding in outpatient settings can benefit from more innovations in popular inpatient coding benchmarks. A deeper analysis of the factors contributing to the success -- amount and form of data and choice of document representation -- reveals the presence of easy-to-solve examples, the coding of which can be completely automated with a low error rate.
△ Less
Submitted 24 December, 2023; v1 submitted 20 December, 2023;
originally announced December 2023.
-
DeFi Security: Turning The Weakest Link Into The Strongest Attraction
Authors:
Ravi Kashyap
Abstract:
The primary innovation we pioneer -- focused on blockchain information security -- is called the Safe-House. The Safe-House is badly needed since there are many ongoing hacks and security concerns in the DeFi space right now. The Safe-House is a piece of engineering sophistication that utilizes existing blockchain principles to bring about greater security when customer assets are moved around. Th…
▽ More
The primary innovation we pioneer -- focused on blockchain information security -- is called the Safe-House. The Safe-House is badly needed since there are many ongoing hacks and security concerns in the DeFi space right now. The Safe-House is a piece of engineering sophistication that utilizes existing blockchain principles to bring about greater security when customer assets are moved around. The Safe-House logic is easily implemented as smart contracts on any decentralized system. The amount of funds at risk from both internal and external parties -- and hence the maximum one time loss -- is guaranteed to stay within the specified limits based on cryptographic fundamentals.
To improve the safety of the Safe-House even further, we adapt the one time password (OPT) concept to operate using blockchain technology. Well suited to blockchain cryptographic nuances, our secondary advancement can be termed the one time next time password (OTNTP) mechanism. The OTNTP is designed to complement the Safe-House making it even more safe.
We provide a detailed threat assessment model -- discussing the risks faced by DeFi protocols and the specific risks that apply to blockchain fund management -- and give technical arguments regarding how these threats can be overcome in a robust manner. We discuss how the Safe-House can participate with other external yield generation protocols in a secure way. We provide reasons for why the Safe-House increases safety without sacrificing the efficiency of operation. We start with a high level intuitive description of the landscape, the corresponding problems and our solutions. We then supplement this overview with detailed discussions including the corresponding mathematical formulations and pointers for technological implementation. This approach ensures that the article is accessible to a broad audience.
△ Less
Submitted 20 November, 2023;
originally announced December 2023.
-
Arguably Adequate Aqueduct Algorithm: Crossing A Bridge-Less Block-Chain Chasm
Authors:
Ravi Kashyap
Abstract:
We consider the problem of being a cross-chain wealth management platform with deposits, redemptions and investment assets across multiple networks. We discuss the need for blockchain bridges to facilitates fund flows across platforms. We point out several issues with existing bridges. We develop an algorithm - tailored to overcome current constraints - that dynamically changes the utilization of…
▽ More
We consider the problem of being a cross-chain wealth management platform with deposits, redemptions and investment assets across multiple networks. We discuss the need for blockchain bridges to facilitates fund flows across platforms. We point out several issues with existing bridges. We develop an algorithm - tailored to overcome current constraints - that dynamically changes the utilization of bridge capacities and hence the amounts to be transferred across networks. We illustrate several scenarios using numerical simulations.
△ Less
Submitted 11 September, 2023;
originally announced November 2023.
-
Neural Discovery of Permutation Subgroups
Authors:
Pavan Karjol,
Rohan Kashyap,
Prathosh A P
Abstract:
We consider the problem of discovering subgroup $H$ of permutation group $S_{n}$. Unlike the traditional $H$-invariant networks wherein $H$ is assumed to be known, we present a method to discover the underlying subgroup, given that it satisfies certain conditions. Our results show that one could discover any subgroup of type $S_{k} (k \leq n)$ by learning an $S_{n}$-invariant function and a linear…
▽ More
We consider the problem of discovering subgroup $H$ of permutation group $S_{n}$. Unlike the traditional $H$-invariant networks wherein $H$ is assumed to be known, we present a method to discover the underlying subgroup, given that it satisfies certain conditions. Our results show that one could discover any subgroup of type $S_{k} (k \leq n)$ by learning an $S_{n}$-invariant function and a linear transformation. We also prove similar results for cyclic and dihedral subgroups. Finally, we provide a general theorem that can be extended to discover other subgroups of $S_{n}$. We also demonstrate the applicability of our results through numerical experiments on image-digit sum and symmetric polynomial regression tasks.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
A Unified Framework for Discovering Discrete Symmetries
Authors:
Pavan Karjol,
Rohan Kashyap,
Aditya Gopalan,
Prathosh A. P
Abstract:
We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions inv…
▽ More
We consider the problem of learning a function respecting a symmetry from among a class of symmetries. We develop a unified framework that enables symmetry discovery across a broad range of subgroups including locally symmetric, dihedral and cyclic subgroups. At the core of the framework is a novel architecture composed of linear, matrix-valued and non-linear functions that expresses functions invariant to these subgroups in a principled manner. The structure of the architecture enables us to leverage multi-armed bandit algorithms and gradient descent to efficiently optimize over the linear and the non-linear functions, respectively, and to infer the symmetry that is ultimately learnt. We also discuss the necessity of the matrix-valued functions in the architecture. Experiments on image-digit sum and polynomial regression tasks demonstrate the effectiveness of our approach.
△ Less
Submitted 27 October, 2023; v1 submitted 6 September, 2023;
originally announced September 2023.
-
Gender Gaps in Online Social Connectivity, Promotion and Relocation Reports on LinkedIn
Authors:
Ghazal Kalhor,
Hannah Gardner,
Ingmar Weber,
Ridhi Kashyap
Abstract:
Online professional social networking platforms provide opportunities to expand networks strategically for job opportunities and career advancement. A large body of research shows that women's offline networks are less advantageous than men's. How online platforms such as LinkedIn may reflect or reproduce gendered networking behaviours, or how online social connectivity may affect outcomes differe…
▽ More
Online professional social networking platforms provide opportunities to expand networks strategically for job opportunities and career advancement. A large body of research shows that women's offline networks are less advantageous than men's. How online platforms such as LinkedIn may reflect or reproduce gendered networking behaviours, or how online social connectivity may affect outcomes differentially by gender is not well understood. This paper analyses aggregate, anonymised data from almost 10 million LinkedIn users in the UK and US information technology (IT) sector collected from the site's advertising platform to explore how being connected to Big Tech companies ('social connectivity') varies by gender, and how gender, age, seniority and social connectivity shape the propensity to report job promotions or relocations. Consistent with previous studies, we find there are fewer women compared to men on LinkedIn in IT. Furthermore, female users are less likely to be connected to Big Tech companies than men. However, when we further analyse recent promotion or relocation reports, we find women are more likely than men to have reported a recent promotion at work, suggesting high-achieving women may be self-selecting onto LinkedIn. Even among this positively selected group, though, we find men are more likely to report a recent relocation. Social connectivity emerges as a significant predictor of promotion and relocation reports, with an interaction effect between gender and social connectivity indicating the payoffs to social connectivity for promotion and relocation reports are larger for women. This suggests that online networking has the potential for larger impacts for women, who experience greater disadvantage in traditional networking contexts, and calls for further research to understand differential impacts of online networking for socially disadvantaged groups.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
PULSAR at MEDIQA-Sum 2023: Large Language Models Augmented by Synthetic Dialogue Convert Patient Dialogues to Medical Records
Authors:
Viktor Schlegel,
Hao Li,
Yuping Wu,
Anand Subramanian,
Thanh-Tung Nguyen,
Abhinav Ramesh Kashyap,
Daniel Beck,
Xiaojun Zeng,
Riza Theresa Batista-Navarro,
Stefan Winkler,
Goran Nenadic
Abstract:
This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the eff…
▽ More
This paper describes PULSAR, our system submission at the ImageClef 2023 MediQA-Sum task on summarising patient-doctor dialogues into clinical records. The proposed framework relies on domain-specific pre-training, to produce a specialised language model which is trained on task-specific natural data augmented by synthetic data generated by a black-box LLM. We find limited evidence towards the efficacy of domain-specific pre-training and data augmentation, while scaling up the language model yields the best performance gains. Our approach was ranked second and third among 13 submissions on task B of the challenge. Our code is available at https://github.com/yuping-wu/PULSAR.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
PULSAR: Pre-training with Extracted Healthcare Terms for Summarising Patients' Problems and Data Augmentation with Black-box Large Language Models
Authors:
Hao Li,
Yuping Wu,
Viktor Schlegel,
Riza Batista-Navarro,
Thanh-Tung Nguyen,
Abhinav Ramesh Kashyap,
Xiaojun Zeng,
Daniel Beck,
Stefan Winkler,
Goran Nenadic
Abstract:
Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generat…
▽ More
Medical progress notes play a crucial role in documenting a patient's hospital journey, including his or her condition, treatment plan, and any updates for healthcare providers. Automatic summarisation of a patient's problems in the form of a problem list can aid stakeholders in understanding a patient's condition, reducing workload and cognitive bias. BioNLP 2023 Shared Task 1A focuses on generating a list of diagnoses and problems from the provider's progress notes during hospitalisation. In this paper, we introduce our proposed approach to this task, which integrates two complementary components. One component employs large language models (LLMs) for data augmentation; the other is an abstractive summarisation LLM with a novel pre-training objective for generating the patients' problems summarised as a list. Our approach was ranked second among all submissions to the shared task. The performance of our model on the development and test datasets shows that our approach is more robust on unknown data, with an improvement of up to 3.1 points over the same size of the larger model.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation
Authors:
Ambuj Mehrish,
Abhinav Ramesh Kashyap,
Li Yingting,
Navonil Majumder,
Soujanya Poria
Abstract:
There are significant challenges for speaker adaptation in text-to-speech for languages that are not widely spoken or for speakers with accents or dialects that are not well-represented in the training data. To address this issue, we propose the use of the "mixture of adapters" method. This approach involves adding multiple adapters within a backbone-model layer to learn the unique characteristics…
▽ More
There are significant challenges for speaker adaptation in text-to-speech for languages that are not widely spoken or for speakers with accents or dialects that are not well-represented in the training data. To address this issue, we propose the use of the "mixture of adapters" method. This approach involves adding multiple adapters within a backbone-model layer to learn the unique characteristics of different speakers. Our approach outperforms the baseline, with a noticeable improvement of 5% observed in speaker preference tests when using only one minute of data for each new speaker. Moreover, following the adapter paradigm, we fine-tune only the adapter parameters (11% of the total model parameters). This is a significant achievement in parameter-efficient speaker adaptation, and one of the first models of its kind. Overall, our proposed approach offers a promising solution to the speech synthesis techniques, particularly for adapting to speakers from diverse backgrounds.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
A Comprehensive Survey of Sentence Representations: From the BERT Epoch to the ChatGPT Era and Beyond
Authors:
Abhinav Ramesh Kashyap,
Thanh-Tung Nguyen,
Viktor Schlegel,
Stefan Winkler,
See-Kiong Ng,
Soujanya Poria
Abstract:
Sentence representations are a critical component in NLP applications such as retrieval, question answering, and text classification. They capture the meaning of a sentence, enabling machines to understand and reason over human language. In recent years, significant progress has been made in developing methods for learning sentence representations, including unsupervised, supervised, and transfer…
▽ More
Sentence representations are a critical component in NLP applications such as retrieval, question answering, and text classification. They capture the meaning of a sentence, enabling machines to understand and reason over human language. In recent years, significant progress has been made in developing methods for learning sentence representations, including unsupervised, supervised, and transfer learning approaches. However there is no literature review on sentence representations till now. In this paper, we provide an overview of the different methods for sentence representation learning, focusing mostly on deep learning models. We provide a systematic organization of the literature, highlighting the key contributions and challenges in this area. Overall, our review highlights the importance of this area in natural language processing, the progress made in sentence representation learning, and the challenges that remain. We conclude with directions for future research, suggesting potential avenues for improving the quality and efficiency of sentence representations.
△ Less
Submitted 2 February, 2024; v1 submitted 21 May, 2023;
originally announced May 2023.
-
A Tale of Two Currencies: Cash and Crypto
Authors:
Ravi Kashyap
Abstract:
We discuss numerous justifications for why crypto-currencies would be highly conducive for the smooth functioning of today's society. We provide several comparisons between cryptocurrencies issued by blockchain projects, crypto, and conventional government issued currencies, cash or fiat. We summarize seven fundamental innovations that would be required for participants to have greater confidence…
▽ More
We discuss numerous justifications for why crypto-currencies would be highly conducive for the smooth functioning of today's society. We provide several comparisons between cryptocurrencies issued by blockchain projects, crypto, and conventional government issued currencies, cash or fiat. We summarize seven fundamental innovations that would be required for participants to have greater confidence in decentralized finance (DeFi) and to obtain wealth appreciation coupled with better risk management. The conceptual ideas we discuss outline an approach to: 1) Strengthened Security Blueprint; 2) Rebalancing and Trade Execution Suited for Blockchain Nuances 3) Volatility and Variance Adjusted Weight Calculation 4) Accommodating Investor Preferences and Risk Parity Construction; 5) Profit Sharing and Investor Protection; 6) Concentration Risk Indicator and Performance Metrics; 7) Multi-chain expansion and Select Strategic Initiatives including the notion of a Decentralized Autonomous Organization (DAO). Incorporating these concepts into several projects would also facilitate the growth of the overall blockchain eco-system so that this technology can, have wider mainstream adoption and, fulfill its potential in transforming all aspects of human interactions.
△ Less
Submitted 13 February, 2023;
originally announced February 2023.
-
UDApter -- Efficient Domain Adaptation Using Adapters
Authors:
Bhavitvya Malik,
Abhinav Ramesh Kashyap,
Min-Yen Kan,
Soujanya Poria
Abstract:
We propose two methods to make unsupervised domain adaptation (UDA) more parameter efficient using adapters, small bottleneck layers interspersed with every layer of the large-scale pre-trained language model (PLM). The first method deconstructs UDA into a two-step process: first by adding a domain adapter to learn domain-invariant information and then by adding a task adapter that uses domain-inv…
▽ More
We propose two methods to make unsupervised domain adaptation (UDA) more parameter efficient using adapters, small bottleneck layers interspersed with every layer of the large-scale pre-trained language model (PLM). The first method deconstructs UDA into a two-step process: first by adding a domain adapter to learn domain-invariant information and then by adding a task adapter that uses domain-invariant information to learn task representations in the source domain. The second method jointly learns a supervised classifier while reducing the divergence measure. Compared to strong baselines, our simple methods perform well in natural language inference (MNLI) and the cross-domain sentiment classification task. We even outperform unsupervised domain adaptation methods such as DANN and DSN in sentiment classification, and we are within 0.85% F1 for natural language inference task, by fine-tuning only a fraction of the full model parameters. We release our code at https://github.com/declare-lab/domadapter
△ Less
Submitted 16 February, 2023; v1 submitted 6 February, 2023;
originally announced February 2023.
-
A survey of deep learning optimizers -- first and second order methods
Authors:
Rohan Kashyap
Abstract:
Deep Learning optimization involves minimizing a high-dimensional loss function in the weight space which is often perceived as difficult due to its inherent difficulties such as saddle points, local minima, ill-conditioning of the Hessian and limited compute resources. In this paper, we provide a comprehensive review of $14$ standard optimization methods successfully used in deep learning researc…
▽ More
Deep Learning optimization involves minimizing a high-dimensional loss function in the weight space which is often perceived as difficult due to its inherent difficulties such as saddle points, local minima, ill-conditioning of the Hessian and limited compute resources. In this paper, we provide a comprehensive review of $14$ standard optimization methods successfully used in deep learning research and a theoretical assessment of the difficulties in numerical optimization from the optimization literature.
△ Less
Submitted 27 September, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
GPT-Neo for commonsense reasoning -- a theoretical and practical lens
Authors:
Rohan Kashyap,
Vivek Kashyap,
Narendra C. P.
Abstract:
Recent work has demonstrated substantial gains in pre-training large-language models (LLMs) followed by supervised fine-tuning on the downstream task. In this paper, we evaluate the performance of the GPT-neo model using $6$ commonsense reasoning benchmark tasks. We aim to examine the performance of smaller models using the GPT-neo models against several larger model baselines such as GPT-$3$, Lla…
▽ More
Recent work has demonstrated substantial gains in pre-training large-language models (LLMs) followed by supervised fine-tuning on the downstream task. In this paper, we evaluate the performance of the GPT-neo model using $6$ commonsense reasoning benchmark tasks. We aim to examine the performance of smaller models using the GPT-neo models against several larger model baselines such as GPT-$3$, Llama-$2$, MPT and Falcon. Upon fine-tuning with the appropriate set of hyperparameters, our model achieves competitive accuracy on several tasks. We also investigate and substantiate our results using attention-head visualization to better understand the model performance. Finally, we conduct various robustness tests using various methods to gauge the model performance under numerous settings.
△ Less
Submitted 27 September, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
Authors:
BigScience Workshop,
:,
Teven Le Scao,
Angela Fan,
Christopher Akiki,
Ellie Pavlick,
Suzana Ilić,
Daniel Hesslow,
Roman Castagné,
Alexandra Sasha Luccioni,
François Yvon,
Matthias Gallé,
Jonathan Tow,
Alexander M. Rush,
Stella Biderman,
Albert Webson,
Pawan Sasanka Ammanamanchi,
Thomas Wang,
Benoît Sagot,
Niklas Muennighoff,
Albert Villanova del Moral,
Olatunji Ruwase,
Rachel Bawden,
Stas Bekman,
Angelina McMillan-Major
, et al. (369 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access…
▽ More
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
△ Less
Submitted 27 June, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
So Different Yet So Alike! Constrained Unsupervised Text Style Transfer
Authors:
Abhinav Ramesh Kashyap,
Devamanyu Hazarika,
Min-Yen Kan,
Roger Zimmermann,
Soujanya Poria
Abstract:
Automatic transfer of text between domains has become popular in recent times. One of its aims is to preserve the semantic content of text being translated from source to target domain. However, it does not explicitly maintain other attributes between the source and translated text, for e.g., text length and descriptiveness. Maintaining constraints in transfer has several downstream applications,…
▽ More
Automatic transfer of text between domains has become popular in recent times. One of its aims is to preserve the semantic content of text being translated from source to target domain. However, it does not explicitly maintain other attributes between the source and translated text, for e.g., text length and descriptiveness. Maintaining constraints in transfer has several downstream applications, including data augmentation and de-biasing. We introduce a method for such constrained unsupervised text style transfer by introducing two complementary losses to the generative adversarial network (GAN) family of models. Unlike the competing losses used in GANs, we introduce cooperative losses where the discriminator and the generator cooperate and reduce the same loss. The first is a contrastive loss and the second is a classification loss, aiming to regularize the latent space further and bring similar sentences across domains closer together. We demonstrate that such training retains lexical, syntactic, and domain-specific constraints between domains for multiple benchmark datasets, including ones where more than one attribute change. We show that the complementary cooperative losses improve text quality, according to both automated and human evaluation measures.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Adaptive cognitive fit: Artificial intelligence augmented management of information facets and representations
Authors:
Jim Samuel,
Rajiv Kashyap,
Yana Samuel,
Alexander Pelaez
Abstract:
Explosive growth in big data technologies and artificial intelligence [AI] applications have led to increasing pervasiveness of information facets and a rapidly growing array of information representations. Information facets, such as equivocality and veracity, can dominate and significantly influence human perceptions of information and consequently affect human performance. Extant research in co…
▽ More
Explosive growth in big data technologies and artificial intelligence [AI] applications have led to increasing pervasiveness of information facets and a rapidly growing array of information representations. Information facets, such as equivocality and veracity, can dominate and significantly influence human perceptions of information and consequently affect human performance. Extant research in cognitive fit, which preceded the big data and AI era, focused on the effects of aligning information representation and task on performance, without sufficient consideration to information facets and attendant cognitive challenges. Therefore, there is a compelling need to understand the interplay of these dominant information facets with information representations and tasks, and their influence on human performance. We suggest that artificially intelligent technologies that can adapt information representations to overcome cognitive limitations are necessary for these complex information environments. To this end, we propose and test a novel *Adaptive Cognitive Fit* [ACF] framework that explains the influence of information facets and AI-augmented information representations on human performance. We draw on information processing theory and cognitive dissonance theory to advance the ACF framework and a set of propositions. We empirically validate the ACF propositions with an economic experiment that demonstrates the influence of information facets, and a machine learning simulation that establishes the viability of using AI to improve human performance.
△ Less
Submitted 24 April, 2022;
originally announced April 2022.
-
A Comprehensive Study on Various Statistical Techniques for Prediction of Movie Success
Authors:
Manav Agarwal,
Shreya Venugopal,
Rishab Kashyap,
R Bharathi
Abstract:
The film industry is one of the most popular entertainment industries and one of the biggest markets for business. Among the contributing factors to this would be the success of a movie in terms of its popularity as well as its box office performance. Hence, we create a comprehensive comparison between the various machine learning models to predict the rate of success of a movie. The effectiveness…
▽ More
The film industry is one of the most popular entertainment industries and one of the biggest markets for business. Among the contributing factors to this would be the success of a movie in terms of its popularity as well as its box office performance. Hence, we create a comprehensive comparison between the various machine learning models to predict the rate of success of a movie. The effectiveness of these models along with their statistical significance is studied to conclude which of these models is the best predictor. Some insights regarding factors that affect the success of the movies are also found. The models studied include some Regression models, Machine Learning models, a Time Series model and a Neural Network with the Neural Network being the best performing model with an accuracy of about 86%. Additionally, as part of the testing data for the movies released in 2020 are analysed.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Domain Divergences: a Survey and Empirical Analysis
Authors:
Abhinav Ramesh Kashyap,
Devamanyu Hazarika,
Min-Yen Kan,
Roger Zimmermann
Abstract:
Domain divergence plays a significant role in estimating the performance of a model in new domains. While there is a significant literature on divergence measures, researchers find it hard to choose an appropriate divergence for a given NLP application. We address this shortcoming by both surveying the literature and through an empirical study. We develop a taxonomy of divergence measures consisti…
▽ More
Domain divergence plays a significant role in estimating the performance of a model in new domains. While there is a significant literature on divergence measures, researchers find it hard to choose an appropriate divergence for a given NLP application. We address this shortcoming by both surveying the literature and through an empirical study. We develop a taxonomy of divergence measures consisting of three classes -- Information-theoretic, Geometric, and Higher-order measures and identify the relationships between them. Further, to understand the common use-cases of these measures, we recognise three novel applications -- 1) Data Selection, 2) Learning Representation, and 3) Decisions in the Wild -- and use it to organise our literature. From this, we identify that Information-theoretic measures are prevalent for 1) and 3), and Higher-order measures are more common for 2). To further help researchers choose appropriate measures to predict drop in performance -- an important aspect of Decisions in the Wild, we perform correlation analysis spanning 130 domain adaptation scenarios, 3 varied NLP tasks and 12 divergence measures identified from our survey. To calculate these divergences, we consider the current contextual word representations (CWR) and contrast with the older distributed representations. We find that traditional measures over word distributions still serve as strong baselines, while higher-order measures with CWR are effective.
△ Less
Submitted 19 April, 2021; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Transparency versus Performance in Financial Markets: The Role of CSR Communications
Authors:
Rajiv Kashyap,
Mohamed Menisy,
Peter Caiazzo,
Jim Samuel
Abstract:
Although companies are exhorted to provide more information to the financial community, it is evident that they choose different paths based upon their strategic emphasis and competitive environments. Our investigation explores the empirical boundary conditions under which firms choose to disclose versus withhold information from investors based upon their strategic emphasis. We found significant…
▽ More
Although companies are exhorted to provide more information to the financial community, it is evident that they choose different paths based upon their strategic emphasis and competitive environments. Our investigation explores the empirical boundary conditions under which firms choose to disclose versus withhold information from investors based upon their strategic emphasis. We found significant differences in terms of voluntary information disclosures between firms that consistently delivered positive earnings surprises versus those that delivered negative earnings surprises. We investigated this effect in a more granular fashion by separately examining differences in environmental, social, and governance disclosures between the two pools of firms. We found that in essence, the differences remained consistent and positive earnings firms were significantly more likely to disclose information about their ESG activities than their counterparts. Interestingly, none of the measures of financial performance were instrumental in distinguishing between the two pools of firms. However, our measures of reach -- as measured by the number of -- negative news stories lends credence to our findings. From a fund manager-s perspective, this finding should raise an immediate red flag firms that are likely to underperform are likely to be less transparent than overperformers.
△ Less
Submitted 8 August, 2020;
originally announced August 2020.
-
That Message Went Viral?! Exploratory Analytics and Sentiment Analysis into the Propagation of Tweets
Authors:
Jim Samuel,
Myles Garvey,
Rajiv Kashyap
Abstract:
Information exchange and message diffusion have moved from traditional media to social media platforms. Messages on platforms such as Twitter have become the default mode of company communications replacing lengthier public announcements and updates. Businesses and organizations have increased their use of Twitter to connect with stakeholders. As a result, it is important to understand the key dri…
▽ More
Information exchange and message diffusion have moved from traditional media to social media platforms. Messages on platforms such as Twitter have become the default mode of company communications replacing lengthier public announcements and updates. Businesses and organizations have increased their use of Twitter to connect with stakeholders. As a result, it is important to understand the key drivers of successful information exchange and message diffusion via Twitter. We conducted an exploratory analysis on a dataset of over a million Tweets, comprising of over 40,000 lead Tweets, further filtered to over 18,000 Tweets. We identified the most popular messages, and analyzed the tweets on multiple endogenous dimensions including content, sentiment, motive and richness, and exogenous dimensions such as fundamental events, social learning, and activism. We found some interesting patterns and uncovered new insights to help researchers and practitioners better understand the behavior of popular viral tweets. We also performed sentiment analysis and present an early stage model to explain tweet performance.
△ Less
Submitted 20 April, 2020;
originally announced April 2020.
-
SciWING -- A Software Toolkit for Scientific Document Processing
Authors:
Abhinav Ramesh Kashyap,
Min-Yen Kan
Abstract:
We introduce SciWING, an open-source software toolkit which provides access to pre-trained models for scientific document processing tasks, inclusive of citation string parsing and logical structure recovery. SciWING enables researchers to rapidly experiment with different models by swapping and stacking different modules. It also enables them declare and run models from a configuration file. It e…
▽ More
We introduce SciWING, an open-source software toolkit which provides access to pre-trained models for scientific document processing tasks, inclusive of citation string parsing and logical structure recovery. SciWING enables researchers to rapidly experiment with different models by swapping and stacking different modules. It also enables them declare and run models from a configuration file. It enables researchers to perform production-ready transfer learning from general, pre-trained transformers (i.e., BERT, SciBERT etc), and aids development of end-user applications. It includes ready-to-use web and terminal-based applications and demonstrations (Available from http://sciwing.io).
△ Less
Submitted 23 October, 2020; v1 submitted 8 April, 2020;
originally announced April 2020.
-
Imitation in the Imitation Game
Authors:
Ravi Kashyap
Abstract:
We discuss the objectives of automation equipped with non-trivial decision making, or creating artificial intelligence, in the financial markets and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles…
▽ More
We discuss the objectives of automation equipped with non-trivial decision making, or creating artificial intelligence, in the financial markets and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles needs to be present. A consideration of these requirements allows us to propose a test of intelligence for trading programs, on the lines of the Turing Test, long the benchmark for intelligent machines. We discuss the application of this methodology to the dilemma in finance, which is whether, when and how much to Buy, Sell or Hold.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Artificial Intelligence: A Child's Play
Authors:
Ravi Kashyap
Abstract:
We discuss the objectives of any endeavor in creating artificial intelligence, AI, and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. This suggests that our attempts at AI could have been misguided. What we actually need to strive for can be termed artificial curiosity, AC, and intelligence ha…
▽ More
We discuss the objectives of any endeavor in creating artificial intelligence, AI, and provide a possible alternative. Intelligence might be an unintended consequence of curiosity left to roam free, best exemplified by a frolicking infant. This suggests that our attempts at AI could have been misguided. What we actually need to strive for can be termed artificial curiosity, AC, and intelligence happens as a consequence of those efforts. For this unintentional yet welcome aftereffect to set in a foundational list of guiding principles needs to be present. We start with the intuition for this line of reasoning and formalize it with a series of definitions, assumptions, ingredients, models and iterative improvements that will be necessary to make the incubation of intelligence a reality. Our discussion provides conceptual modifications to the Turing Test and to Searle's Chinese room argument. We discuss the future implications for society as AI becomes an integral part of life.
We provide a road-map for creating intelligence with the technical parts relegated to the appendix so that the article is accessible to a wide audience. The central techniques in our formal approach to creating intelligence draw upon tools and concepts widely used in physics, cognitive science, psychology, evolutionary biology, statistics, linguistics, communication systems, pattern recognition, marketing, economics, finance, information science and computational theory highlighting that solutions for creating artificial intelligence have to transcend the artificial barriers between various fields and be highly multi-disciplinary.
△ Less
Submitted 30 January, 2021; v1 submitted 1 July, 2019;
originally announced July 2019.
-
Auction Theory Adaptations for Real Life Applications
Authors:
Ravi Kashyap
Abstract:
We develop extensions to auction theory results that are useful in real life scenarios.
1. Since valuations are generally positive we first develop approximations using the log-normal distribution. This would be useful for many finance related auction settings since asset prices are usually non-negative.
2. We formulate a positive symmetric discrete distribution, which is likely to be followed…
▽ More
We develop extensions to auction theory results that are useful in real life scenarios.
1. Since valuations are generally positive we first develop approximations using the log-normal distribution. This would be useful for many finance related auction settings since asset prices are usually non-negative.
2. We formulate a positive symmetric discrete distribution, which is likely to be followed by the total number of auction participants, and incorporate this into auction theory results.
3. We develop extensions when the valuations of the bidders are interdependent and incorporate all the results developed into a final combined realistic setting.
4. Our methods can be a practical tool for bidders and auction sellers to maximize their profits. The models developed here could be potentially useful for inventory estimation and for wholesale procurement of financial instruments and also non-financial commodities.
All the propositions are new results and they refer to existing results which are stated as Lemmas.
△ Less
Submitted 8 May, 2019; v1 submitted 25 September, 2018;
originally announced October 2018.
-
A Study of Associative Evidential Reasoning
Authors:
Yizong Cheng,
Rangasami L. Kashyap
Abstract:
Evidential reasoning is cast as the problem of simplifying the evidence-hypothesis relation and constructing combination formulas that possess certain testable properties. Important classes of evidence as identifiers, annihilators, and idempotents and their roles in determining binary operations on intervals of reals are discussed. The appropriate way of constructing formulas for combining evidenc…
▽ More
Evidential reasoning is cast as the problem of simplifying the evidence-hypothesis relation and constructing combination formulas that possess certain testable properties. Important classes of evidence as identifiers, annihilators, and idempotents and their roles in determining binary operations on intervals of reals are discussed. The appropriate way of constructing formulas for combining evidence and their limitations, for instance, in robustness, are presented.
△ Less
Submitted 27 March, 2013;
originally announced April 2013.
-
Evidence Combination and Reasoning and Its Application to Real-World Problem-Solving
Authors:
L. W. Chang,
Rangasami L. Kashyap
Abstract:
In this paper a new mathematical procedure is presented for combining different pieces of evidence which are represented in the interval form to reflect our knowledge about the truth of a hypothesis. Evidences may be correlated to each other (dependent evidences) or conflicting in supports (conflicting evidences). First, assuming independent evidences, we propose a methodology to construct combina…
▽ More
In this paper a new mathematical procedure is presented for combining different pieces of evidence which are represented in the interval form to reflect our knowledge about the truth of a hypothesis. Evidences may be correlated to each other (dependent evidences) or conflicting in supports (conflicting evidences). First, assuming independent evidences, we propose a methodology to construct combination rules which obey a set of essential properties. The method is based on a geometric model. We compare results obtained from Dempster-Shafer's rule and the proposed combination rules with both conflicting and non-conflicting data and show that the values generated by proposed combining rules are in tune with our intuition in both cases. Secondly, in the case that evidences are known to be dependent, we consider extensions of the rules derived for handling conflicting evidence. The performance of proposed rules are shown by different examples. The results show that the proposed rules reasonably make decision under dependent evidences
△ Less
Submitted 27 March, 2013;
originally announced April 2013.