-
ANLS* -- A Universal Document Processing Metric for Generative Large Language Models
Authors:
David Peer,
Philemon Schöpf,
Volckmar Nebendahl,
Alexander Rietzler,
Sebastian Stabinger
Abstract:
Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language mode…
▽ More
Traditionally, discriminative models have been the predominant choice for tasks like document classification and information extraction. These models make predictions that fall into a limited number of predefined classes, facilitating a binary true or false evaluation and enabling the direct calculation of metrics such as the F1 score. However, recent advancements in generative large language models (GLLMs) have prompted a shift in the field due to their enhanced zero-shot capabilities, which eliminate the need for a downstream dataset and computationally expensive fine-tuning. However, evaluating GLLMs presents a challenge as the binary true or false evaluation used for discriminative models is not applicable to the predictions made by GLLMs.
This paper introduces a new metric for generative models called ANLS* for evaluating a wide variety of tasks, including information extraction and classification tasks. The ANLS* metric extends existing ANLS metrics as a drop-in-replacement and is still compatible with previously reported ANLS scores. An evaluation of 7 different datasets, and more than 20 different GLLMs together with 3 different prompting methods using the ANLS* metric is also provided, demonstrating the importance of the proposed metric.
We also benchmark a novel approach to generate prompts for documents, called SFT, against other prompting techniques such as LATIN. In almost all cases, SFT outperforms other techniques and improves the state-of-the-art, sometimes by as much as $10$ percentage points.
Sources are available at https://github.com/deepopinion/anls_star_metric
△ Less
Submitted 22 April, 2025; v1 submitted 6 February, 2024;
originally announced February 2024.
-
A perspective on the current state-of-the-art of quantum computing for drug discovery applications
Authors:
Nick S. Blunt,
Joan Camps,
Ophelia Crawford,
Róbert Izsák,
Sebastian Leontica,
Arjun Mirani,
Alexandra E. Moylett,
Sam A. Scivier,
Christoph Sünderhauf,
Patrick Schopf,
Jacob M. Taylor,
Nicole Holzmann
Abstract:
Computational chemistry is an essential tool in the pharmaceutical industry. Quantum computing is a fast evolving technology that promises to completely shift the computational capabilities in many areas of chemical research by bringing into reach currently impossible calculations. This perspective illustrates the near-future applicability of quantum computation to pharmaceutical problems. We brie…
▽ More
Computational chemistry is an essential tool in the pharmaceutical industry. Quantum computing is a fast evolving technology that promises to completely shift the computational capabilities in many areas of chemical research by bringing into reach currently impossible calculations. This perspective illustrates the near-future applicability of quantum computation to pharmaceutical problems. We briefly summarize and compare the scaling properties of state-of-the-art quantum algorithms, and provide novel estimates of the quantum computational cost of simulating progressively larger embedding regions of a pharmaceutically relevant covalent protein-drug complex involving the drug Ibrutinib. Carrying out these calculations requires an error-corrected quantum architecture, that we describe. Our estimates showcase that recent developments on quantum algorithms have dramatically reduced the quantum resources needed to run fully quantum calculations in active spaces of around 50 orbitals and electrons, from estimated over 1000 years using the Trotterisation approach to just a few days with sparse qubitisation, painting a picture of fast and exciting progress in this nascent field.
△ Less
Submitted 20 March, 2023; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Quantum Computing in Pharma: A Multilayer Embedding Approach for Near Future Applications
Authors:
Robert Izsak,
Christoph Riplinger,
Nick S. Blunt,
Bernardo de Souza,
Nicole Holzmann,
Ophelia Crawford,
Joan Camps,
Frank Neese,
Patrick Schopf
Abstract:
Quantum computers are special purpose machines that are expected to be particularly useful in simulating strongly correlated chemical systems. The quantum computer excels at treating a moderate number of orbitals within an active space in a fully quantum mechanical manner. We present a quantum phase estimation calculation on F$_2$ in a (2,2) active space on Rigetti's Aspen-11 QPU. While this is a…
▽ More
Quantum computers are special purpose machines that are expected to be particularly useful in simulating strongly correlated chemical systems. The quantum computer excels at treating a moderate number of orbitals within an active space in a fully quantum mechanical manner. We present a quantum phase estimation calculation on F$_2$ in a (2,2) active space on Rigetti's Aspen-11 QPU. While this is a promising start, it also underlines the need for carefully selecting the orbital spaces treated by the quantum computer. In this work, a scheme for selecting such an active space automatically is described and simulated results obtained using both the quantum phase estimation (QPE) and variational quantum eigensolver (VQE) algorithms are presented and combined with a subtractive method to enable accurate description of the environment. The active occupied space is selected from orbitals localized on the chemically relevant fragment of the molecule, while the corresponding virtual space is chosen based on the magnitude of interactions with the occupied space calculated from perturbation theory. This protocol is then applied to two chemical systems of pharmaceutical relevance: the enzyme [Fe] hydrogenase and the photosenzitizer temoporfin. While the sizes of the active spaces currently amenable to a quantum computational treatment are not enough to demonstrate quantum advantage, the procedure outlined here is applicable to any active space size, including those that are outside the reach of classical computation.
△ Less
Submitted 4 June, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.