Search | arXiv e-print repository

Large Language Models Perform on Par with Experts Identifying Mental Health Factors in Adolescent Online Forums

Authors: Isabelle Lorge, Dan W. Joyce, Andrey Kormilitzin

Abstract: Mental health in children and adolescents has been steadily deteriorating over the past few years. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for op… ▽ More Mental health in children and adolescents has been steadily deteriorating over the past few years. The recent advent of Large Language Models (LLMs) offers much hope for cost and time efficient scaling of monitoring and intervention, yet despite specifically prevalent issues such as school bullying and eating disorders, previous studies on have not investigated performance in this domain or for open information extraction where the set of answers is not predetermined. We create a new dataset of Reddit posts from adolescents aged 12-19 annotated by expert psychiatrists for the following categories: TRAUMA, PRECARITY, CONDITION, SYMPTOMS, SUICIDALITY and TREATMENT and compare expert labels to annotations from two top performing LLMs (GPT3.5 and GPT4). In addition, we create two synthetic datasets to assess whether LLMs perform better when annotating data as they generate it. We find GPT4 to be on par with human inter-annotator agreement and performance on synthetic data to be substantially higher, however we find the model still occasionally errs on issues of negation and factuality and higher performance on synthetic data is driven by greater complexity of real data rather than inherent advantage. △ Less

Submitted 26 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2403.19802 [pdf, other]

Developing Healthcare Language Model Embedding Spaces

Authors: Niall Taylor, Dan Schofield, Andrey Kormilitzin, Dan W Joyce, Alejo Nevado-Holgado

Abstract: Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text. We explore specialized pre-training to adapt smaller LLMs to different healthcare datasets. Three methods are assessed: traditional masked language modeling, Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR), and a novel pre-training objective utilizing metadat… ▽ More Pre-trained Large Language Models (LLMs) often struggle on out-of-domain datasets like healthcare focused text. We explore specialized pre-training to adapt smaller LLMs to different healthcare datasets. Three methods are assessed: traditional masked language modeling, Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR), and a novel pre-training objective utilizing metadata categories from the healthcare settings. These schemes are evaluated on downstream document classification tasks for each dataset, with additional analysis of the resultant embedding spaces. Contrastively trained models outperform other approaches on the classification tasks, delivering strong performance from limited labeled data and with fewer model parameter updates required. While metadata-based pre-training does not further improve classifications across the datasets, it yields interesting embedding cluster separability. All domain adapted LLMs outperform their publicly available general base LLM, validating the importance of domain-specialization. This research illustrates efficient approaches to instill healthcare competency in compact LLMs even under tight computational budgets, an essential capability for responsible and sustainable deployment in local healthcare settings. We provide pre-training guidelines for specialized healthcare LLMs, motivate continued inquiry into contrastive objectives, and demonstrates adaptation techniques to align small LLMs with privacy-sensitive medical tasks. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.19790 [pdf, other]

Bespoke Large Language Models for Digital Triage Assistance in Mental Health Care

Authors: Niall Taylor, Andrey Kormilitzin, Isabelle Lorge, Alejo Nevado-Holgado, Dan W Joyce

Abstract: Contemporary large language models (LLMs) may have utility for processing unstructured, narrative free-text clinical data contained in electronic health records (EHRs) -- a particularly important use-case for mental health where a majority of routinely-collected patient data lacks structured, machine-readable content. A significant problem for the the United Kingdom's National Health Service (NH… ▽ More Contemporary large language models (LLMs) may have utility for processing unstructured, narrative free-text clinical data contained in electronic health records (EHRs) -- a particularly important use-case for mental health where a majority of routinely-collected patient data lacks structured, machine-readable content. A significant problem for the the United Kingdom's National Health Service (NHS) are the long waiting lists for specialist mental healthcare. According to NHS data, in each month of 2023, there were between 370,000 and 470,000 individual new referrals into secondary mental healthcare services. Referrals must be triaged by clinicians, using clinical information contained in the patient's EHR to arrive at a decision about the most appropriate mental healthcare team to assess and potentially treat these patients. The ability to efficiently recommend a relevant team by ingesting potentially voluminous clinical notes could help services both reduce referral waiting times and with the right technology, improve the evidence available to justify triage decisions. We present and evaluate three different approaches for LLM-based, end-to-end ingestion of variable-length clinical EHR data to assist clinicians when triaging referrals. Our model is able to deliver triage recommendations consistent with existing clinical practices and it's architecture was implemented on a single GPU, making it practical for implementation in resource-limited NHS environments where private implementations of LLM technology will be necessary to ensure confidential clinical data is appropriately controlled and governed. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.10597 [pdf, other]

Efficiency at Scale: Investigating the Performance of Diminutive Language Models in Clinical Tasks

Authors: Niall Taylor, Upamanyu Ghose, Omid Rohanian, Mohammadmahdi Nouriborji, Andrey Kormilitzin, David Clifton, Alejo Nevado-Holgado

Abstract: The entry of large language models (LLMs) into research and commercial spaces has led to a trend of ever-larger models, with initial promises of generalisability, followed by a widespread desire to downsize and create specialised models without the need for complete fine-tuning, using Parameter Efficient Fine-tuning (PEFT) methods. We present an investigation into the suitability of different PEFT… ▽ More The entry of large language models (LLMs) into research and commercial spaces has led to a trend of ever-larger models, with initial promises of generalisability, followed by a widespread desire to downsize and create specialised models without the need for complete fine-tuning, using Parameter Efficient Fine-tuning (PEFT) methods. We present an investigation into the suitability of different PEFT methods to clinical decision-making tasks, across a range of model sizes, including extremely small models with as few as $25$ million parameters. Our analysis shows that the performance of most PEFT approaches varies significantly from one task to another, with the exception of LoRA, which maintains relatively high performance across all model sizes and tasks, typically approaching or matching full fine-tuned performance. The effectiveness of PEFT methods in the clinical domain is evident, particularly for specialised models which can operate on low-cost, in-house computing infrastructure. The advantages of these models, in terms of speed and reduced training costs, dramatically outweighs any performance gain from large foundation LLMs. Furthermore, we highlight how domain-specific pre-training interacts with PEFT methods and model size, and discuss how these factors interplay to provide the best efficiency-performance trade-off. Full code available at: tbd. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.07645 [pdf, other]

Detecting the Clinical Features of Difficult-to-Treat Depression using Synthetic Data from Large Language Models

Authors: Isabelle Lorge, Dan W. Joyce, Niall Taylor, Alejo Nevado-Holgado, Andrea Cipriani, Andrey Kormilitzin

Abstract: Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where despite treatment, they continue to experience significant burden. We sought to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record (EHR) data to locate publi… ▽ More Difficult-to-treat depression (DTD) has been proposed as a broader and more clinically comprehensive perspective on a person's depressive disorder where despite treatment, they continue to experience significant burden. We sought to develop a Large Language Model (LLM)-based tool capable of interrogating routinely-collected, narrative (free-text) electronic health record (EHR) data to locate published prognostic factors that capture the clinical syndrome of DTD. In this work, we use LLM-generated synthetic data (GPT3.5) and a Non-Maximum Suppression (NMS) algorithm to train a BERT-based span extraction model. The resulting model is then able to extract and label spans related to a variety of relevant positive and negative factors in real clinical data (i.e. spans of text that increase or decrease the likelihood of a patient matching the DTD syndrome). We show it is possible to obtain good overall performance (0.70 F1 across polarity) on real clinical data on a set of as many as 20 different factors, and high performance (0.85 F1 with 0.95 precision) on a subset of important DTD factors such as history of abuse, family history of affective disorder, illness severity and suicidality by training the model exclusively on synthetic data. Our results show promise for future healthcare applications especially in applications where traditionally, highly confidential medical data and human-expert annotation would normally be required. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2205.05535 [pdf, other]

Clinical Prompt Learning with Frozen Language Models

Authors: Niall Taylor, Yi Zhang, Dan Joyce, Alejo Nevado-Holgado, Andrey Kormilitzin

Abstract: Prompt learning is a new paradigm in the Natural Language Processing (NLP) field which has shown impressive performance on a number of natural language tasks with common benchmarking text datasets in full, few-shot, and zero-shot train-evaluation setups. Recently, it has even been observed that large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tun… ▽ More Prompt learning is a new paradigm in the Natural Language Processing (NLP) field which has shown impressive performance on a number of natural language tasks with common benchmarking text datasets in full, few-shot, and zero-shot train-evaluation setups. Recently, it has even been observed that large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models. However, as with many recent NLP trends, the performance of even the largest PLMs such as GPT-3 do not perform well on specialized domains (e.g. medical text), and the common practice to achieve State of the Art (SoTA) results still consists of pre-training and fine-tuning the PLMs on downstream tasks. The reliance on fine-tuning large PLMs is problematic in clinical settings where data is often held in non-GPU environments, and more resource efficient methods of training specialized domain models is crucial. We investigated the viability of prompt learning on clinically meaningful decision tasks and directly compared with more traditional fine-tuning methods. Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning with substantially fewer trainable parameters and requiring less training data. We argue that prompt learning therefore provides lower computational resource costs applicable to clinical settings, that can serve as an alternative to fine-tuning ever increasing in size PLMs. Complementary code to reproduce experiments presented in this work can be found at: https://github.com/NtaylorOX/Public_Clinical_Prompt. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: 18 pages, 6 figures, 6 tables

MSC Class: ACM-class: J.2

arXiv:2111.07611 [pdf, other]

Rationale production to support clinical decision-making

Authors: Niall Taylor, Lei Sha, Dan W Joyce, Thomas Lukasiewicz, Alejo Nevado-Holgado, Andrey Kormilitzin

Abstract: The development of neural networks for clinical artificial intelligence (AI) is reliant on interpretability, transparency, and performance. The need to delve into the black-box neural network and derive interpretable explanations of model output is paramount. A task of high clinical importance is predicting the likelihood of a patient being readmitted to hospital in the near future to enable effic… ▽ More The development of neural networks for clinical artificial intelligence (AI) is reliant on interpretability, transparency, and performance. The need to delve into the black-box neural network and derive interpretable explanations of model output is paramount. A task of high clinical importance is predicting the likelihood of a patient being readmitted to hospital in the near future to enable efficient triage. With the increasing adoption of electronic health records (EHRs), there is great interest in applications of natural language processing (NLP) to clinical free-text contained within EHRs. In this work, we apply InfoCal, the current state-of-the-art model that produces extractive rationales for its predictions, to the task of predicting hospital readmission using hospital discharge notes. We compare extractive rationales produced by InfoCal to competitive transformer-based models pretrained on clinical text data and for which the attention mechanism can be used for interpretation. We find each presented model with selected interpretability or feature importance methods yield varying results, with clinical language domain expertise and pretraining critical to performance and subsequent interpretability. △ Less

Submitted 15 November, 2021; originally announced November 2021.

Comments: Machine Learning for Health (ML4H) - Extended Abstract

arXiv:2010.12260 [pdf, other]

Population Gradients improve performance across data-sets and architectures in object classification

Authors: Yurika Sakai, Andrey Kormilitzin, Qiang Liu, Alejo Nevado-Holgado

Abstract: The most successful methods such as ReLU transfer functions, batch normalization, Xavier initialization, dropout, learning rate decay, or dynamic optimizers, have become standards in the field due, particularly, to their ability to increase the performance of Neural Networks (NNs) significantly and in almost all situations. Here we present a new method to calculate the gradients while training NNs… ▽ More The most successful methods such as ReLU transfer functions, batch normalization, Xavier initialization, dropout, learning rate decay, or dynamic optimizers, have become standards in the field due, particularly, to their ability to increase the performance of Neural Networks (NNs) significantly and in almost all situations. Here we present a new method to calculate the gradients while training NNs, and show that it significantly improves final performance across architectures, data-sets, hyper-parameter values, training length, and model sizes, including when it is being combined with other common performance-improving methods (such as the ones mentioned above). Besides being effective in the wide array situations that we have tested, the increase in performance (e.g. F1) it provides is as high or higher than this one of all the other widespread performance-improving methods that we have compared against. We call our method Population Gradients (PG), and it consists on using a population of NNs to calculate a non-local estimation of the gradient, which is closer to the theoretical exact gradient (i.e. this one obtainable only with an infinitely big data-set) of the error function than the empirical gradient (i.e. this one obtained with the real finite data-set). △ Less

Submitted 23 October, 2020; originally announced October 2020.

arXiv:2010.08433 [pdf, other]

An efficient representation of chronological events in medical texts

Authors: Andrey Kormilitzin, Nemanja Vaci, Qiang Liu, Hao Ni, Goran Nenadic, Alejo Nevado-Holgado

Abstract: In this work we addressed the problem of capturing sequential information contained in longitudinal electronic health records (EHRs). Clinical notes, which is a particular type of EHR data, are a rich source of information and practitioners often develop clever solutions how to maximise the sequential information contained in free-texts. We proposed a systematic methodology for learning from chron… ▽ More In this work we addressed the problem of capturing sequential information contained in longitudinal electronic health records (EHRs). Clinical notes, which is a particular type of EHR data, are a rich source of information and practitioners often develop clever solutions how to maximise the sequential information contained in free-texts. We proposed a systematic methodology for learning from chronological events available in clinical notes. The proposed methodological {\it path signature} framework creates a non-parametric hierarchical representation of sequential events of any type and can be used as features for downstream statistical learning tasks. The methodology was developed and externally validated using the largest in the UK secondary care mental health EHR data on a specific task of predicting survival risk of patients diagnosed with Alzheimer's disease. The signature-based model was compared to a common survival random forest model. Our results showed a 15.4$\%$ increase of risk prediction AUC at the time point of 20 months after the first admission to a specialist memory clinic and the signature method outperformed the baseline mixed-effects model by 13.2 $\%$. △ Less

Submitted 24 October, 2020; v1 submitted 16 October, 2020; originally announced October 2020.

Comments: 4 pages, 2 figures, 7 tables

arXiv:2003.01271 [pdf, other]

Med7: a transferable clinical natural language processing model for electronic health records

Authors: Andrey Kormilitzin, Nemanja Vaci, Qiang Liu, Alejo Nevado-Holgado

Abstract: The field of clinical natural language processing has been advanced significantly since the introduction of deep learning models. The self-supervised representation learning and the transfer learning paradigm became the methods of choice in many natural language processing application, in particular in the settings with the dearth of high quality manually annotated data. Electronic health record s… ▽ More The field of clinical natural language processing has been advanced significantly since the introduction of deep learning models. The self-supervised representation learning and the transfer learning paradigm became the methods of choice in many natural language processing application, in particular in the settings with the dearth of high quality manually annotated data. Electronic health record systems are ubiquitous and the majority of patients' data are now being collected electronically and in particular in the form of free text. Identification of medical concepts and information extraction is a challenging task, yet important ingredient for parsing unstructured data into structured and tabulated format for downstream analytical tasks. In this work we introduced a named-entity recognition model for clinical natural language processing. The model is trained to recognise seven categories: drug names, route, frequency, dosage, strength, form, duration. The model was first self-supervisedly pre-trained by predicting the next word, using a collection of 2 million free-text patients' records from MIMIC-III corpora and then fine-tuned on the named-entity recognition task. The model achieved a lenient (strict) micro-averaged F1 score of 0.957 (0.893) across all seven categories. Additionally, we evaluated the transferability of the developed model using the data from the Intensive Care Unit in the US to secondary care mental health records (CRIS) in the UK. A direct application of the trained NER model to CRIS data resulted in reduced performance of F1=0.762, however after fine-tuning on a small sample from CRIS, the model achieved a reasonable performance of F1=0.944. This demonstrated that despite a close similarity between the data sets and the NER tasks, it is essential to fine-tune on the target domain data in order to achieve more accurate results. △ Less

Submitted 24 April, 2020; v1 submitted 2 March, 2020; originally announced March 2020.

Comments: 16 pages, 1 figure, 15 tables

arXiv:1908.11399 [pdf, other]

Deep Learning for Estimating Synaptic Health of Primary Neuronal Cell Culture

Authors: Andrey Kormilitzin, Xinyu Yang, William H. Stone, Caroline Woffindale, Francesca Nicholls, Elena Ribe, Alejo Nevado-Holgado, Noel Buckley

Abstract: Understanding the morphological changes of primary neuronal cells induced by chemical compounds is essential for drug discovery. Using the data from a single high-throughput imaging assay, a classification model for predicting the biological activity of candidate compounds was introduced. The image recognition model which is based on deep convolutional neural network (CNN) architecture with residu… ▽ More Understanding the morphological changes of primary neuronal cells induced by chemical compounds is essential for drug discovery. Using the data from a single high-throughput imaging assay, a classification model for predicting the biological activity of candidate compounds was introduced. The image recognition model which is based on deep convolutional neural network (CNN) architecture with residual connections achieved accuracy of 99.6$\%$ on a binary classification task of distinguishing untreated and treated rodent primary neuronal cells with Amyloid-$β_{(25-35)}$. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: 11 pages, 5 figures

arXiv:1901.01592 [pdf]

Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks

Authors: Luka Gligic, Andrey Kormilitzin, Paul Goldberg, Alejo Nevado-Holgado

Abstract: Neural networks (NNs) have become the state of the art in many machine learning applications, especially in image and sound processing [1]. The same, although to a lesser extent [2,3], could be said in natural language processing (NLP) tasks, such as named entity recognition. However, the success of NNs remains dependent on the availability of large labelled datasets, which is a significant hurdle… ▽ More Neural networks (NNs) have become the state of the art in many machine learning applications, especially in image and sound processing [1]. The same, although to a lesser extent [2,3], could be said in natural language processing (NLP) tasks, such as named entity recognition. However, the success of NNs remains dependent on the availability of large labelled datasets, which is a significant hurdle in many important applications. One such case are electronic health records (EHRs), which are arguably the largest source of medical data, most of which lies hidden in natural text [4,5]. Data access is difficult due to data privacy concerns, and therefore annotated datasets are scarce. With scarce data, NNs will likely not be able to extract this hidden information with practical accuracy. In our study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge [6], 4.3 above the architecture that won the competition. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms. To reach this state-of-the-art accuracy, our approach applies transfer learning to leverage on datasets annotated for other I2B2 tasks, and designs and trains embeddings that specially benefit from such transfer. △ Less

Submitted 29 July, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

Comments: 11 pages, 4 figures, 8 tables

arXiv:1811.05468 [pdf]

Few-shot Learning for Named Entity Recognition in Medical Text

Authors: Maximilian Hofer, Andrey Kormilitzin, Paul Goldberg, Alejo Nevado-Holgado

Abstract: Deep neural network models have recently achieved state-of-the-art performance gains in a variety of natural language processing (NLP) tasks (Young, Hazarika, Poria, & Cambria, 2017). However, these gains rely on the availability of large amounts of annotated examples, without which state-of-the-art performance is rarely achievable. This is especially inconvenient for the many NLP fields where ann… ▽ More Deep neural network models have recently achieved state-of-the-art performance gains in a variety of natural language processing (NLP) tasks (Young, Hazarika, Poria, & Cambria, 2017). However, these gains rely on the availability of large amounts of annotated examples, without which state-of-the-art performance is rarely achievable. This is especially inconvenient for the many NLP fields where annotated examples are scarce, such as medical text. To improve NLP models in this situation, we evaluate five improvements on named entity recognition (NER) tasks when only ten annotated examples are available: (1) layer-wise initialization with pre-trained weights, (2) hyperparameter tuning, (3) combining pre-training data, (4) custom word embeddings, and (5) optimizing out-of-vocabulary (OOV) words. Experimental results show that the F1 score of 69.3% achievable by state-of-the-art models can be improved to 78.87%. △ Less

Submitted 13 November, 2018; originally announced November 2018.

Comments: 10 pages, 4 figures, 4 tables

arXiv:1708.01206 [pdf, other]

Detecting early signs of depressive and manic episodes in patients with bipolar disorder using the signature-based model

Authors: Andrey Kormilitzin, Kate E. A. Saunders, Paul J. Harrison, John R. Geddes, Terry Lyons

Abstract: Recurrent major mood episodes and subsyndromal mood instability cause substantial disability in patients with bipolar disorder. Early identification of mood episodes enabling timely mood stabilisation is an important clinical goal. Recent technological advances allow the prospective reporting of mood in real time enabling more accurate, efficient data capture. The complex nature of these data stre… ▽ More Recurrent major mood episodes and subsyndromal mood instability cause substantial disability in patients with bipolar disorder. Early identification of mood episodes enabling timely mood stabilisation is an important clinical goal. Recent technological advances allow the prospective reporting of mood in real time enabling more accurate, efficient data capture. The complex nature of these data streams in combination with challenge of deriving meaning from missing data mean pose a significant analytic challenge. The signature method is derived from stochastic analysis and has the ability to capture important properties of complex ordered time series data. To explore whether the onset of episodes of mania and depression can be identified using self-reported mood data. △ Less

Submitted 3 August, 2017; originally announced August 2017.

Comments: 12 pages, 3 tables, 10 figures

arXiv:1606.02074 [pdf, ps, other]

Application of the Signature Method to Pattern Recognition in the CEQUEL Clinical Trial

Authors: A. B. Kormilitzin, K. E. A. Saunders, P. J. Harrison, J. R. Geddes, T. J. Lyons

Abstract: The classification procedure of streaming data usually requires various ad hoc methods or particular heuristic models. We explore a novel non-parametric and systematic approach to analysis of heterogeneous sequential data. We demonstrate an application of this method to classification of the delays in responding to the prompts, from subjects with bipolar disorder collected during a clinical trial,… ▽ More The classification procedure of streaming data usually requires various ad hoc methods or particular heuristic models. We explore a novel non-parametric and systematic approach to analysis of heterogeneous sequential data. We demonstrate an application of this method to classification of the delays in responding to the prompts, from subjects with bipolar disorder collected during a clinical trial, using both synthetic and real examples. We show how this method can provide a natural and systematic way to extract characteristic features from sequential data. △ Less

Submitted 7 June, 2016; originally announced June 2016.

Comments: 16 pages, 7 figures

arXiv:1603.03788 [pdf, other]

A Primer on the Signature Method in Machine Learning

Authors: Ilya Chevyrev, Andrey Kormilitzin

Abstract: We provide an introduction to the signature method, focusing on its theoretical properties and machine learning applications. Our presentation is divided into two parts. In the first part, we present the definition and fundamental properties of the signature of a path. The signature is a sequence of numbers associated with a path that captures many of its important analytic and geometric propertie… ▽ More We provide an introduction to the signature method, focusing on its theoretical properties and machine learning applications. Our presentation is divided into two parts. In the first part, we present the definition and fundamental properties of the signature of a path. The signature is a sequence of numbers associated with a path that captures many of its important analytic and geometric properties. As a sequence of numbers, the signature serves as a compact description (dimension reduction) of a path. In presenting its theoretical properties, we assume only familiarity with classical real analysis and integration, and supplement theory with straightforward examples. We also mention several advanced topics, including the role of the signature in rough path theory. In the second part, we present practical applications of the signature to the area of machine learning. The signature method is a non-parametric way of transforming data into a set of features that can be used in machine learning tasks. In this method, data are converted into multi-dimensional paths, by means of embedding algorithms, of which the signature is then computed. We describe this pipeline in detail, making a link with the properties of the signature presented in the first part. We furthermore review some of the developments of the signature method in machine learning and, as an illustrative example, present a detailed application of the method to handwritten digit classification. △ Less

Submitted 17 January, 2025; v1 submitted 11 March, 2016; originally announced March 2016.

Comments: 61 pages, 26 figures, 3 tables. Expanded Part 1 and simplified the presentation in Part 2. To appear in Open Access in a forthcoming Springer volume "Signatures Methods in Finance: An Introduction with Computational Applications"

arXiv:1411.2294 [pdf, ps, other]

doi 10.1103/PhysRevD.91.045005

Analytic structure of the $n = 7$ scattering amplitude in $\mathcal{N}=4$ SYM theory in multi-Regge kinematics: Conformal Regge cut contribution

Authors: Jochen Bartels, Andrey Kormilitzin, Lev N. Lipatov

Abstract: In this second part of our investigation of the analytic structure of the $2\to5$ scattering amplitude in the planar limit of $\mathcal{N}=4$ SYM in multi-Regge kinematics we compute, in all kinematic regions, the Regge cut contributions in leading order. The results are infrared finite and conformally invariant. In this second part of our investigation of the analytic structure of the $2\to5$ scattering amplitude in the planar limit of $\mathcal{N}=4$ SYM in multi-Regge kinematics we compute, in all kinematic regions, the Regge cut contributions in leading order. The results are infrared finite and conformally invariant. △ Less

Submitted 9 November, 2014; originally announced November 2014.

Comments: 44 pages, 14 figures, 2 tables

Journal ref: Phys. Rev. D 91, 045005 (2015)

arXiv:1311.2061 [pdf, ps, other]

doi 10.1103/PhysRevD.89.065002

Analytic structure of the $n=7$ scattering amplitude in $\mathcal{N}=4$ SYM theory at multi-Regge kinematics: Conformal Regge pole contribution

Authors: Jochen Bartels, Andrey Kormilitzin, Lev Lipatov

Abstract: We investigate the analytic structure of the $2\to5$ scattering amplitude in the planar limit of $\mathcal{N}=4$ SYM in multi-Regge kinematics in all physical regions. We demonstrate the close connection between Regge pole and Regge cut contributions: in a selected class of kinematic regions (Mandelstam regions) the usual factorizing Regge pole formula develops unphysical singularities which have… ▽ More We investigate the analytic structure of the $2\to5$ scattering amplitude in the planar limit of $\mathcal{N}=4$ SYM in multi-Regge kinematics in all physical regions. We demonstrate the close connection between Regge pole and Regge cut contributions: in a selected class of kinematic regions (Mandelstam regions) the usual factorizing Regge pole formula develops unphysical singularities which have to be absorbed and compensated by Regge cut contributions. This leads, in the corrections to the BDS formula, to conformal invariant 'renormalized' Regge pole expressions in the remainder function. We compute these renormalized Regge poles for the $2\to5$ scattering amplitude. △ Less

Submitted 19 May, 2014; v1 submitted 8 November, 2013; originally announced November 2013.

Comments: 46 pages, references added, typos corrected, journal version

Report number: DESY 13-209

Journal ref: Phys. Rev. D 89 (2014), 065002

arXiv:1112.6366 [pdf, ps, other]

doi 10.1103/PhysRevD.86.065026

BFKL approach and 2->5 MHV amplitude

Authors: J. Bartels, A. Kormilitzin, L. N. Lipatov, A. Prygarin

Abstract: We study MHV amplitude for the 2 -> 5 scattering in the multi-Regge kinematics. The Mandelstam cut correction to the BDS amplitude is calculated in the leading logarithmic approximation (LLA) and the corresponding remainder function is given to any loop order in a closed integral form. We show that the LLA remainder function at two loops for 2 -> 5 amplitude can be written as a sum of two 2 -> 4 r… ▽ More We study MHV amplitude for the 2 -> 5 scattering in the multi-Regge kinematics. The Mandelstam cut correction to the BDS amplitude is calculated in the leading logarithmic approximation (LLA) and the corresponding remainder function is given to any loop order in a closed integral form. We show that the LLA remainder function at two loops for 2 -> 5 amplitude can be written as a sum of two 2 -> 4 remainder functions due to recursive properties of the leading order impact factors. We also make some generalizations for the MHV amplitudes with more external particles. The results of the present study are in agreement with all leg two loop symbol derived by Caron-Huot as shown in a parallel paper of one of the authors with collaborators. △ Less

Submitted 19 March, 2012; v1 submitted 29 December, 2011; originally announced December 2011.

Comments: 24 pages, 17 figures

arXiv:1106.3268 [pdf, ps, other]

doi 10.1016/j.nuclphysa.2011.09.021

Geometric scaling behavior of the scattering amplitude for DIS with nuclei

Authors: Andrey Kormilitzin, Eugene Levin, Sebastian Tapia

Abstract: The main question, that we answer in this paper, is whether the initial condition can influence on the geometric scaling behavior of the amplitude for DIS at high energy. We re-write the non-linear Balitsky-Kovchegov equation in the form which is useful for treating the interaction with nuclei. Using the simplified BFKL kernel, we find the analytical solution to this equation with the initial cond… ▽ More The main question, that we answer in this paper, is whether the initial condition can influence on the geometric scaling behavior of the amplitude for DIS at high energy. We re-write the non-linear Balitsky-Kovchegov equation in the form which is useful for treating the interaction with nuclei. Using the simplified BFKL kernel, we find the analytical solution to this equation with the initial condition given by the McLerran-Venugopalan formula. This solution does not show the geometric scaling behavior of the amplitude deeply in the saturation region. On the other hand, the BFKL Pomeron calculus with the initial condition at $x_A = 1/mR_A$ given by the solution to Balitsky-Kovchegov equation, leads to the geometric scaling behavior. The McLerran - Venugopalan formula is the natural initial condition for the Color Glass Condensate (CGC) approach. Therefore, our result gives a possibility to check experimentally which approach: CGC or BFKL Pomeron calculus, is more adequate. △ Less

Submitted 16 June, 2011; originally announced June 2011.

Comments: 19pp, 11 figures in .eps files

Report number: TAUP 2929/11

arXiv:1011.1248 [pdf, ps, other]

doi 10.1016/j.nuclphysa.2011.05.009

On the Nuclear Modification Factor at RHIC and LHC

Authors: Andrey Kormilitzin, Eugene Levin, Amir H. Rezaeian

Abstract: We show that pQCD factorization incorporated with pre-haronization energy-loss effect naturally leads to flatness of the nuclear modification factor R_{AA} for produced hadrons at high transverse momentum p_T. We consider two possible scenarios for the pre-hadronization: In scenario 1, the produced gluon propagates through dense QCD medium and loses energy. In scenario 2, all gluons first decay to… ▽ More We show that pQCD factorization incorporated with pre-haronization energy-loss effect naturally leads to flatness of the nuclear modification factor R_{AA} for produced hadrons at high transverse momentum p_T. We consider two possible scenarios for the pre-hadronization: In scenario 1, the produced gluon propagates through dense QCD medium and loses energy. In scenario 2, all gluons first decay to quark-antiquark pairs and then each pair loses energy as propagating through the medium. We show that the estimates of the energy-loss in these two different models lead to very close values and is able to explain the suppression of high-p_T hadrons in nucleus-nucleus collisions at RHIC. We show that the onset of the flatness of R_{AA} for the produced hadron in central collisions at midrapidity is about p_T\approx 15 and 25 GeV at RHIC and the LHC energies, respectively. We show that the smallness (R_{AA}<0.5) and the high-p_T flatness of R_{AA} obtained from the k_T factorization supplemented with the Balitsky-Kovchegov (BK) equation is rather generic and it does not strongly depend on the details of the BK solutions. We show that energy-loss effect reduces the nuclear modification factor obtained from the k_T factorization about 30÷50% at moderate p_T. △ Less

Submitted 16 May, 2011; v1 submitted 4 November, 2010; originally announced November 2010.

Comments: 14 pages, 10 figures; v2: results unchanged, more discussion and references added. The version to appear in Nucl. Phys. A

Journal ref: Nucl.Phys.A860:84-101,2011

arXiv:1009.1468 [pdf, ps, other]

doi 10.1016/j.nuclphysa.2010.11.005

Non-linear equation: energy conservation and impact parameter dependence

Authors: Andrey Kormilitzin, Eugene Levin

Abstract: In this paper we address two questions: how energy conservation affects the solution to the non-linear equation, and how impact parameter dependence influences the inclusive production. Answering the first question we solve the modified BK equation which takes into account energy conservation. In spite of the fact that we used the simplified kernel, we believe that the main result of the paper: th… ▽ More In this paper we address two questions: how energy conservation affects the solution to the non-linear equation, and how impact parameter dependence influences the inclusive production. Answering the first question we solve the modified BK equation which takes into account energy conservation. In spite of the fact that we used the simplified kernel, we believe that the main result of the paper: the small ($\leq 40%$) suppression of the inclusive productiondue to energy conservation, reflects a general feature. This result leads us to believe that the small value of the nuclear modification factor is of a non-perturbative nature. In the solution a new scale appears $Q_{fr} = Q_s \exp(-1/(2 \bas))$ and the production of dipoles with the size larger than $2/Q_{fr}$ is suppressed. Therefore, we can expect that the typical temperature for hadron production is about $Q_{fr}$ ($ T \approx Q_{fr}$). The simplified equation allows us to obtain a solution to Balitsky-Kovchegov equation taking into account the impact parameter dependence. We show that the impact parameter ($b$) dependence can be absorbed into the non-perturbative $b$ dependence of the saturation scale. The solution of the BK equation, as well as of the modified BK equation without $b$ dependence, is only accurate up to $\pm 25%$. △ Less

Submitted 8 September, 2010; originally announced September 2010.

Comments: 24 pp. 8 figures in eps files

Report number: TAUP 2920/10

Journal ref: Nucl.Phys.A849:98-119,2011

arXiv:1009.1329 [pdf, ps, other]

doi 10.1016/j.nuclphysa.2011.04.011

High density QCD and nucleus-nucleus scattering deeply in the saturation region

Authors: Andrey Kormilitzin, Eugene Levin, Jeremy S. Miller

Abstract: In this paper we solve the equations that describe nucleus nucleus scattering, in high density QCD,in the framework of the BFKL Pomeron calculus. We found that (i) the contribution of short distances to the opacity for nucleus-nucleus scattering dies at high energies, (ii) the opacity tends to unity at high energy, and (iii) the main contribution that survives comes from soft (long distance) proce… ▽ More In this paper we solve the equations that describe nucleus nucleus scattering, in high density QCD,in the framework of the BFKL Pomeron calculus. We found that (i) the contribution of short distances to the opacity for nucleus-nucleus scattering dies at high energies, (ii) the opacity tends to unity at high energy, and (iii) the main contribution that survives comes from soft (long distance) processes for large values of the impact parameter. The corrections to the opacity $Ω\Lb Y,b\Rb = 1$ were calculated and it turns out that they have a completely different form, namely($1 - Ω\to \exp\Lb - Const\,\sqrt{Y}\Rb$) than the opacity that stems from the Balisky-Kovchegov equation, which is($1 - Ω\to \exp\Lb - Const\,Y^2\Rb$). We reproduce the formula for the nucleus-nucleus cross section that is commonly used in the description of nucleus-nucleus scattering, and there is no reason why it should be correct in the Glauber-Gribov approach △ Less

Submitted 4 July, 2011; v1 submitted 7 September, 2010; originally announced September 2010.

Comments: 25pp and 12 figures in eps format

Report number: TAUP 2919/10

Journal ref: Nucl.Phys.A859:87-113,2011

arXiv:0912.4689 [pdf, ps, other]

doi 10.1016/j.nuclphysa.2010.04.016

QCD motivated approach to soft interactions at high energies: nucleus-nucleus and hadron-nucleus collisions

Authors: E. Gotsman, A. Kormilitzin, E. Levin, U. Maor

Abstract: In this paper we consider nucleus-nucleus and hadron-nucleus reactions in the kinematic region: $g A^{1/3} G_{3\pom} \exp\Lb ΔY\Rb \approx 1 G^2_{3\pom} \exp\Lb ΔY\Rb \approx 1 $, where $G_{3\pom}$ is the triple Pomeron coupling, $g$ is the vertex of Pomeron nucleon interaction, and 1 + $Δ_{\pom}$ denotes the Pomeron intercept. We find that in this kinematic region the traditional Glauber-Grib… ▽ More In this paper we consider nucleus-nucleus and hadron-nucleus reactions in the kinematic region: $g A^{1/3} G_{3\pom} \exp\Lb ΔY\Rb \approx 1 G^2_{3\pom} \exp\Lb ΔY\Rb \approx 1 $, where $G_{3\pom}$ is the triple Pomeron coupling, $g$ is the vertex of Pomeron nucleon interaction, and 1 + $Δ_{\pom}$ denotes the Pomeron intercept. We find that in this kinematic region the traditional Glauber-Gribov eikonal approach is inadequate. We show that it is necesssary to take into account inelastic Glauber corrections, which can not be expressed in terms of the nucleon-nucleon scattering amplitudes. In the wide range of energies where $α'_\pom Y \ll R^2_A$,the scattering amplitude for the nucleus-nucleus interaction, does not depend on the details of the nucleon-nucleon interaction at high energy. In the formalism we present, the only (correlated) parameters that are required to describe the data are $Δ_{\pom}$, $G_{3\pom}$ and $g$. These parameters were taken from our description of the nucleon-nucleon data at high energies \cite{GLMM}.The predicted nucleus modification factor is compared with RHIC Au-Au data at $W = 200 GeV.$ Estimates for LHC energies are presented and discusssed. △ Less

Submitted 23 December, 2009; originally announced December 2009.

Comments: 18pp. 14 fugures

Report number: TAUP -2907-09

Journal ref: Nucl.Phys.A842:82-101,2010

arXiv:0811.0863 [pdf, ps, other]

doi 10.1063/1.3122183

A QCD motivated model for soft processes

Authors: A. Kormilitzin, E. Levin

Abstract: In this talk we give a brief description of a QCD motivated model for both hard and soft interactions at high energies. In this model the long distance behaviour of the scattering amplitude is determined by the dipole scattering amplitude in the saturation domain. All phenomenological parameters for dipole-proton interaction were fitted from the deep inelastic scattering data and the soft proces… ▽ More In this talk we give a brief description of a QCD motivated model for both hard and soft interactions at high energies. In this model the long distance behaviour of the scattering amplitude is determined by the dipole scattering amplitude in the saturation domain. All phenomenological parameters for dipole-proton interaction were fitted from the deep inelastic scattering data and the soft processes are described with only one new parameter, related to the wave function of hadron. It turns out that we do not need to introduce the so called soft Pomeron that has been used in high energy phenomenology for four decades. △ Less

Submitted 6 November, 2008; originally announced November 2008.

Comments: 5 pages, figures, talk at "Diffraction'08"

Report number: TAUP 2888-06

arXiv:0809.3886 [pdf, ps, other]

Soft processes at high energy without soft Pomeron: a QCD motivated model

Authors: A. Kormilitzin, E. Levin

Abstract: In this paper we develop a QCD motivated model for both hard and soft interactions at high energies. In this model the long distance behavior of the scattering amplitude is determined by the approximate solution to the non-linear evolution equation for parton system in the saturation domain. All phenomenological parameters for dipole-proton interaction were fitted from the deep inelastic scatter… ▽ More In this paper we develop a QCD motivated model for both hard and soft interactions at high energies. In this model the long distance behavior of the scattering amplitude is determined by the approximate solution to the non-linear evolution equation for parton system in the saturation domain. All phenomenological parameters for dipole-proton interaction were fitted from the deep inelastic scattering data and the soft processes are described with only one new parameter, related to the wave function of hadron. It turns out that we do not need to introduce the so called soft Pomeron that has been used in high energy phenomenology for four decades. The model described all data on soft interactions: the values of total, elastic and diffractive cross sections as well as their $s$ and $t$ behavior. The value for the survival probability of the diffractive Higgs production is calculated being less 1% for the LHC energy range. △ Less

Submitted 1 March, 2009; v1 submitted 23 September, 2008; originally announced September 2008.

Comments: 33pages, 16 figures

Report number: TAUP 2884/08

arXiv:0807.3413 [pdf, ps, other]

doi 10.1016/j.nuclphysa.2008.09.006

Multiparticle production in the mean field approximation of high density QCD

Authors: Andrey Kormilitzin, Eugene Levin, Alex Prygarin

Abstract: The generating functional is suggested for multiparticle generation processes. In mean field approximation of high density QCD two equations for new generating functional are derived: linear functional equation for an arbitrary initial condition and non-linear one for a specific initial condition. The non-linear equation has the form of Kovchegov-Levin equation for diffraction production and giv… ▽ More The generating functional is suggested for multiparticle generation processes. In mean field approximation of high density QCD two equations for new generating functional are derived: linear functional equation for an arbitrary initial condition and non-linear one for a specific initial condition. The non-linear equation has the form of Kovchegov-Levin equation for diffraction production and gives its generalization on the processes with fixed multiplicities of produced particles. △ Less

Submitted 22 July, 2008; originally announced July 2008.

Comments: 11 pages, 5 figures

Report number: TAUP-2879/08

Journal ref: Nucl.Phys.A813:1-13,2008

arXiv:0707.2202 [pdf, ps, other]

Saturation model in the non-Glauber approach

Authors: Andrey Kormilitzin

Abstract: In this paper a new saturation model is presented. This model is based on the theoretical solution for the generating functional, and it is quite different and not more complicated than the Glauber-like approach used before. The model describes the structure function F_{2} of the proton, as well as the diffractive structure function F_{2}^{D}. We show the difference between our model, and the ei… ▽ More In this paper a new saturation model is presented. This model is based on the theoretical solution for the generating functional, and it is quite different and not more complicated than the Glauber-like approach used before. The model describes the structure function F_{2} of the proton, as well as the diffractive structure function F_{2}^{D}. We show the difference between our model, and the eikonal approach by calculating the multiplicity distribution, using the AGK cutting rules strategy. △ Less

Submitted 31 July, 2007; v1 submitted 15 July, 2007; originally announced July 2007.

Comments: 27 pages, 18 figures and one table, typos were corrected

arXiv:hep-ph/0702053 [pdf, ps, other]

doi 10.1140/epjc/s10052-007-0388-2

Survival probability for high mass diffraction

Authors: E. Gotsman, A. Kormilitzin, E. Levin, U. Maor

Abstract: Based on the calculation of survival probabilities, we discuss the problem of extracting the value of $G_{3P}$, the triple Pomeron 'bare' coupling constant, by comparing the large rapidity gap single high mass diffraction data in proton-proton scattering and $J/Ψ$ photo and DIS production. For p-p scattering the calculation in a three amplitude rescattering eikonal model, predicts the survival p… ▽ More Based on the calculation of survival probabilities, we discuss the problem of extracting the value of $G_{3P}$, the triple Pomeron 'bare' coupling constant, by comparing the large rapidity gap single high mass diffraction data in proton-proton scattering and $J/Ψ$ photo and DIS production. For p-p scattering the calculation in a three amplitude rescattering eikonal model, predicts the survival probability to be an order of magnitude smaller than for the two amplitude case. The survival probabilities calculation for photo and DIS $J/Ψ$ production is made in a dedicated model. In this process we show that, even though its survival probability is considerably larger than in p-p scattering, its value is below unity and cannot be neglected in the data analysis. We argue that, regardless of the uncertainties in the suggested procedure, its outcome is important both with regards to a realistic estimate of $G_{3P}$, and the survival probabilities relevant to LHC experiments. △ Less

Submitted 1 August, 2007; v1 submitted 5 February, 2007; originally announced February 2007.

Comments: 17 pages, 8 pictures and one table

Report number: TAUP -2846-07

Journal ref: Eur.Phys.J.C52:295-304,2007

Showing 1–29 of 29 results for author: Kormilitzin, A