-
Gene42: Long-Range Genomic Foundation Model With Dense Attention
Authors:
Kirill Vishniakov,
Boulbaba Ben Amor,
Engin Tekin,
Nancy A. ElNaker,
Karthik Viswanathan,
Aleksandr Medvedev,
Aahan Singh,
Maryam Nadeem,
Mohammad Amaan Sayeed,
Praveenkumar Kanithi,
Tiago Magalhaes,
Natalia Vassilieva,
Dwarikanath Mahapatra,
Marco Pimentel,
and Shadab Khan
Abstract:
We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context…
▽ More
We introduce Gene42, a novel family of Genomic Foundation Models (GFMs) designed to manage context lengths of up to 192,000 base pairs (bp) at a single-nucleotide resolution. Gene42 models utilize a decoder-only (LLaMA-style) architecture with a dense self-attention mechanism. Initially trained on fixed-length sequences of 4,096 bp, our models underwent continuous pretraining to extend the context length to 192,000 bp. This iterative extension allowed for the comprehensive processing of large-scale genomic data and the capture of intricate patterns and dependencies within the human genome. Gene42 is the first dense attention model capable of handling such extensive long context lengths in genomics, challenging state-space models that often rely on convolutional operators among other mechanisms. Our pretrained models exhibit notably low perplexity values and high reconstruction accuracy, highlighting their strong ability to model genomic data. Extensive experiments on various genomic benchmarks have demonstrated state-of-the-art performance across multiple tasks, including biotype classification, regulatory region identification, chromatin profiling prediction, variant pathogenicity prediction, and species classification. The models are publicly available at huggingface.co/inceptionai.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Bridging Language Barriers in Healthcare: A Study on Arabic LLMs
Authors:
Nada Saadi,
Tathagata Raha,
Clément Christophe,
Marco AF Pimentel,
Ronnie Rajan,
Praveen K Kanithi
Abstract:
This paper investigates the challenges of developing large language models (LLMs) proficient in both multilingual understanding and medical knowledge. We demonstrate that simply translating medical data does not guarantee strong performance on clinical tasks in the target language. Our experiments reveal that the optimal language mix in training data varies significantly across different medical t…
▽ More
This paper investigates the challenges of developing large language models (LLMs) proficient in both multilingual understanding and medical knowledge. We demonstrate that simply translating medical data does not guarantee strong performance on clinical tasks in the target language. Our experiments reveal that the optimal language mix in training data varies significantly across different medical tasks. We find that larger models with carefully calibrated language ratios achieve superior performance on native-language clinical tasks. Furthermore, our results suggest that relying solely on fine-tuning may not be the most effective approach for incorporating new language knowledge into LLMs. Instead, data and computationally intensive pretraining methods may still be necessary to achieve optimal performance in multilingual medical settings. These findings provide valuable guidance for building effective and inclusive medical AI systems for diverse linguistic communities.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Named Clinical Entity Recognition Benchmark
Authors:
Wadood M Abdul,
Marco AF Pimentel,
Muhammad Umar Salman,
Tathagata Raha,
Clément Christophe,
Praveen K Kanithi,
Nasir Hayat,
Ronnie Rajan,
Shadab Khan
Abstract:
This technical report introduces a Named Clinical Entity Recognition Benchmark for evaluating language models in healthcare, addressing the crucial natural language processing (NLP) task of extracting structured information from clinical narratives to support applications like automated coding, clinical trial cohort identification, and clinical decision support.
The leaderboard provides a standa…
▽ More
This technical report introduces a Named Clinical Entity Recognition Benchmark for evaluating language models in healthcare, addressing the crucial natural language processing (NLP) task of extracting structured information from clinical narratives to support applications like automated coding, clinical trial cohort identification, and clinical decision support.
The leaderboard provides a standardized platform for assessing diverse language models, including encoder and decoder architectures, on their ability to identify and classify clinical entities across multiple medical domains. A curated collection of openly available clinical datasets is utilized, encompassing entities such as diseases, symptoms, medications, procedures, and laboratory measurements. Importantly, these entities are standardized according to the Observational Medical Outcomes Partnership (OMOP) Common Data Model, ensuring consistency and interoperability across different healthcare systems and datasets, and a comprehensive evaluation of model performance. Performance of models is primarily assessed using the F1-score, and it is complemented by various assessment modes to provide comprehensive insights into model performance. The report also includes a brief analysis of models evaluated to date, highlighting observed trends and limitations.
By establishing this benchmarking framework, the leaderboard aims to promote transparency, facilitate comparative analyses, and drive innovation in clinical entity recognition tasks, addressing the need for robust evaluation methods in healthcare NLP.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs
Authors:
Clément Christophe,
Tathagata Raha,
Svetlana Maslenkova,
Muhammad Umar Salman,
Praveen K Kanithi,
Marco AF Pimentel,
Shadab Khan
Abstract:
Large Language Models (LLMs) have demonstrated significant potential in transforming clinical applications. In this study, we investigate the efficacy of four techniques in adapting LLMs for clinical use-cases: continuous pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale clinical pretraining d…
▽ More
Large Language Models (LLMs) have demonstrated significant potential in transforming clinical applications. In this study, we investigate the efficacy of four techniques in adapting LLMs for clinical use-cases: continuous pretraining, instruct fine-tuning, NEFTune, and prompt engineering. We employ these methods on Mistral 7B and Mixtral 8x7B models, leveraging a large-scale clinical pretraining dataset of 50 billion tokens and an instruct fine-tuning dataset of 500 million tokens. Our evaluation across various clinical tasks reveals the impact of each technique. While continuous pretraining beyond 250 billion tokens yields marginal improvements on its own, it establishes a strong foundation for instruct fine-tuning. Notably, NEFTune, designed primarily to enhance generation quality, surprisingly demonstrates additional gains on our benchmark. Complex prompt engineering methods further enhance performance. These findings show the importance of tailoring fine-tuning strategies and exploring innovative techniques to optimize LLM performance in the clinical domain.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications
Authors:
Praveen K Kanithi,
Clément Christophe,
Marco AF Pimentel,
Tathagata Raha,
Nada Saadi,
Hamza Javed,
Svetlana Maslenkova,
Nasir Hayat,
Ronnie Rajan,
Shadab Khan
Abstract:
The rapid development of Large Language Models (LLMs) for healthcare applications has spurred calls for holistic evaluation beyond frequently-cited benchmarks like USMLE, to better reflect real-world performance. While real-world assessments are valuable indicators of utility, they often lag behind the pace of LLM evolution, likely rendering findings obsolete upon deployment. This temporal disconn…
▽ More
The rapid development of Large Language Models (LLMs) for healthcare applications has spurred calls for holistic evaluation beyond frequently-cited benchmarks like USMLE, to better reflect real-world performance. While real-world assessments are valuable indicators of utility, they often lag behind the pace of LLM evolution, likely rendering findings obsolete upon deployment. This temporal disconnect necessitates a comprehensive upfront evaluation that can guide model selection for specific clinical applications. We introduce MEDIC, a framework assessing LLMs across five critical dimensions of clinical competence: medical reasoning, ethics and bias, data and language understanding, in-context learning, and clinical safety. MEDIC features a novel cross-examination framework quantifying LLM performance across areas like coverage and hallucination detection, without requiring reference outputs. We apply MEDIC to evaluate LLMs on medical question-answering, safety, summarization, note generation, and other tasks. Our results show performance disparities across model sizes, baseline vs medically finetuned models, and have implications on model selection for applications requiring specific model strengths, such as low hallucination or lower cost of inference. MEDIC's multifaceted evaluation reveals these performance trade-offs, bridging the gap between theoretical capabilities and practical implementation in healthcare settings, ensuring that the most promising models are identified and adapted for diverse healthcare applications.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Med42-v2: A Suite of Clinical LLMs
Authors:
Clément Christophe,
Praveen K Kanithi,
Tathagata Raha,
Shadab Khan,
Marco AF Pimentel
Abstract:
Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering…
▽ More
Med42-v2 introduces a suite of clinical large language models (LLMs) designed to address the limitations of generic models in healthcare settings. These models are built on Llama3 architecture and fine-tuned using specialized clinical data. They underwent multi-stage preference alignment to effectively respond to natural prompts. While generic models are often preference-aligned to avoid answering clinical queries as a precaution, Med42-v2 is specifically trained to overcome this limitation, enabling its use in clinical settings. Med42-v2 models demonstrate superior performance compared to the original Llama3 models in both 8B and 70B parameter configurations and GPT-4 across various medical benchmarks. These LLMs are developed to understand clinical queries, perform reasoning tasks, and provide valuable assistance in clinical environments. The models are now publicly available at \href{https://huggingface.co/m42-health}{https://huggingface.co/m42-health}.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
Authors:
Marco AF Pimentel,
Clément Christophe,
Tathagata Raha,
Prateek Munjal,
Praveen K Kanithi,
Shadab Khan
Abstract:
As large language models (LLMs) continue to evolve, the need for robust and standardized evaluation benchmarks becomes paramount. Evaluating the performance of these models is a complex challenge that requires careful consideration of various linguistic tasks, model architectures, and benchmarking methodologies. In recent years, various frameworks have emerged as noteworthy contributions to the fi…
▽ More
As large language models (LLMs) continue to evolve, the need for robust and standardized evaluation benchmarks becomes paramount. Evaluating the performance of these models is a complex challenge that requires careful consideration of various linguistic tasks, model architectures, and benchmarking methodologies. In recent years, various frameworks have emerged as noteworthy contributions to the field, offering comprehensive evaluation tests and benchmarks for assessing the capabilities of LLMs across diverse domains. This paper provides an exploration and critical analysis of some of these evaluation methodologies, shedding light on their strengths, limitations, and impact on advancing the state-of-the-art in natural language processing.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches
Authors:
Clément Christophe,
Praveen K Kanithi,
Prateek Munjal,
Tathagata Raha,
Nasir Hayat,
Ronnie Rajan,
Ahmed Al-Mahrooqi,
Avani Gupta,
Muhammad Umar Salman,
Gurpreet Gosal,
Bhargav Kanakiya,
Charles Chen,
Natalia Vassilieva,
Boulbaba Ben Amor,
Marco AF Pimentel,
Shadab Khan
Abstract:
This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering…
▽ More
This study presents a comprehensive analysis and comparison of two predominant fine-tuning methodologies - full-parameter fine-tuning and parameter-efficient tuning - within the context of medical Large Language Models (LLMs). We developed and refined a series of LLMs, based on the Llama-2 architecture, specifically designed to enhance medical knowledge retrieval, reasoning, and question-answering capabilities. Our experiments systematically evaluate the effectiveness of these tuning strategies across various well-known medical benchmarks. Notably, our medical LLM Med42 showed an accuracy level of 72% on the US Medical Licensing Examination (USMLE) datasets, setting a new standard in performance for openly available medical LLMs. Through this comparative analysis, we aim to identify the most effective and efficient method for fine-tuning LLMs in the medical domain, thereby contributing significantly to the advancement of AI-driven healthcare applications.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
A Data-Driven Biophysical Computational Model of Parkinson's Disease based on Marmoset Monkeys
Authors:
Caetano M. Ranieri,
Jhielson M. Pimentel,
Marcelo R. Romano,
Leonardo A. Elias,
Roseli A. F. Romero,
Michael A. Lones,
Mariana F. P. Araujo,
Patricia A. Vargas,
Renan C. Moioli
Abstract:
In this work we propose a new biophysical computational model of brain regions relevant to Parkinson's Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinson's disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex…
▽ More
In this work we propose a new biophysical computational model of brain regions relevant to Parkinson's Disease based on local field potential data collected from the brain of marmoset monkeys. Parkinson's disease is a neurodegenerative disorder, linked to the death of dopaminergic neurons at the substantia nigra pars compacta, which affects the normal dynamics of the basal ganglia-thalamus-cortex neuronal circuit of the brain. Although there are multiple mechanisms underlying the disease, a complete description of those mechanisms and molecular pathogenesis are still missing, and there is still no cure. To address this gap, computational models that resemble neurobiological aspects found in animal models have been proposed. In our model, we performed a data-driven approach in which a set of biologically constrained parameters is optimised using differential evolution. Evolved models successfully resembled single-neuron mean firing rates and spectral signatures of local field potentials from healthy and parkinsonian marmoset brain data. As far as we are concerned, this is the first computational model of Parkinson's Disease based on simultaneous electrophysiological recordings from seven brain regions of Marmoset monkeys. Results show that the proposed model could facilitate the investigation of the mechanisms of PD and support the development of techniques that can indicate new therapies. It could also be applied to other computational neuroscience problems in which biological data could be used to fit multi-scale models of brain circuits.
△ Less
Submitted 1 September, 2021; v1 submitted 26 July, 2021;
originally announced July 2021.
-
Oligopoly Dynamics
Authors:
Bernardo Melo Pimentel
Abstract:
The present notes summarise the oligopoly dynamics lectures professor Luís Cabral gave at the Bank of Portugal in September and October 2017. The lectures discuss a set industrial organisation problems in a dynamic environment, namely learning by doing, switching costs, price wars, networks and platforms, and ladder models of innovation. Methodologically, the materials cover analytical solutions o…
▽ More
The present notes summarise the oligopoly dynamics lectures professor Luís Cabral gave at the Bank of Portugal in September and October 2017. The lectures discuss a set industrial organisation problems in a dynamic environment, namely learning by doing, switching costs, price wars, networks and platforms, and ladder models of innovation. Methodologically, the materials cover analytical solutions of known points (e.g., $δ= 0$), the discussion of firms' strategies based on intuitions derived directly from their value functions with no model solving, and the combination of analytical and numerical procedures to reach model solutions. State space analysis is done for both continuous and discrete cases. All errors are my own.
△ Less
Submitted 27 May, 2020;
originally announced May 2020.
-
Remote health monitoring and diagnosis in the time of COVID-19
Authors:
Joachim A. Behar,
Chengyu Liu,
Kevin Kotzen,
Kenta Tsutsui,
Valentina D. A. Corino,
Janmajay Singh,
Marco A. F. Pimentel,
Philip Warrick,
Sebastian Zaunseder,
Fernando Andreotti,
David Sebag,
Georgy Popanitsa,
Patrick E. McSharry,
Walter Karlen,
Chandan Karmakar,
Gari D. Clifford
Abstract:
Coronavirus disease (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that is rapidly spreading across the globe. The clinical spectrum of SARS-CoV-2 pneumonia ranges from mild to critically ill cases and requires early detection and monitoring, within a clinical environment for critical cases and remotely for mild cases. The fear of contamination in clinical…
▽ More
Coronavirus disease (COVID-19) is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that is rapidly spreading across the globe. The clinical spectrum of SARS-CoV-2 pneumonia ranges from mild to critically ill cases and requires early detection and monitoring, within a clinical environment for critical cases and remotely for mild cases. The fear of contamination in clinical environments has led to a dramatic reduction in on-site referrals for routine care. There has also been a perceived need to continuously monitor non-severe COVID- 19 patients, either from their quarantine site at home, or dedicated quarantine locations (e.g., hotels). Thus, the pandemic has driven incentives to innovate and enhance or create new routes for providing healthcare services at distance. In particular, this has created a dramatic impetus to find innovative ways to remotely and effectively monitor patient health status. In this paper we present a short review of remote health monitoring initiatives taken in 19 states during the time of the pandemic. We emphasize in the discussion particular aspects that are common ground for the reviewed states, in particular the future impact of the pandemic on remote health monitoring and consideration on data privacy.
△ Less
Submitted 15 October, 2020; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Fusarium Damaged Kernels Detection Using Transfer Learning on Deep Neural Network Architecture
Authors:
Márcio Nicolau,
Márcia Barrocas Moreira Pimentel,
Casiane Salete Tibola,
José Mauricio Cunha Fernandes,
Willingthon Pavan
Abstract:
The present work shows the application of transfer learning for a pre-trained deep neural network (DNN), using a small image dataset ($\approx$ 12,000) on a single workstation with enabled NVIDIA GPU card that takes up to 1 hour to complete the training task and archive an overall average accuracy of $94.7\%$. The DNN presents a $20\%$ score of misclassification for an external test dataset. The a…
▽ More
The present work shows the application of transfer learning for a pre-trained deep neural network (DNN), using a small image dataset ($\approx$ 12,000) on a single workstation with enabled NVIDIA GPU card that takes up to 1 hour to complete the training task and archive an overall average accuracy of $94.7\%$. The DNN presents a $20\%$ score of misclassification for an external test dataset. The accuracy of the proposed methodology is equivalent to ones using HSI methodology $(81\%-91\%)$ used for the same task, but with the advantage of being independent on special equipment to classify wheat kernel for FHB symptoms.
△ Less
Submitted 31 January, 2018;
originally announced February 2018.