-
A Novel Multilevel Taxonomical Approach for Describing High-Dimensional Unlabeled Movement Data
Authors:
Yashat Tavakoli,
Amilcar Soares,
Lourdes Pena
Abstract:
Movement data is prevalent across various applications and scientific fields, often characterized by its massive scale and complexity. Exploratory Data Analysis (EDA) plays a crucial role in summarizing and describing such data, enabling researchers to generate insights and support scientific hypotheses. Despite its importance, traditional EDA practices face limitations when applied to high-dimens…
▽ More
Movement data is prevalent across various applications and scientific fields, often characterized by its massive scale and complexity. Exploratory Data Analysis (EDA) plays a crucial role in summarizing and describing such data, enabling researchers to generate insights and support scientific hypotheses. Despite its importance, traditional EDA practices face limitations when applied to high-dimensional, unlabeled movement data. The complexity and multi-faceted nature of this type of data require more advanced methods that go beyond the capabilities of current EDA techniques. This study addresses the gap in current EDA practices by proposing a novel approach that leverages movement variable taxonomies and outlier detection. We hypothesize that organizing movement features into a taxonomy, and applying anomaly detection to combinations of taxonomic nodes, can reveal meaningful patterns and lead to more interpretable descriptions of the data. To test this hypothesis, we introduce TUMD, a new method that integrates movement taxonomies with outlier detection to enhance data analysis and interpretation. TUMD was evaluated across four diverse datasets of moving objects using fixed parameter values. Its effectiveness was assessed through two passes: the first pass categorized the majority of movement patterns as Kinematic, Geometric, or Hybrid for all datasets, while the second pass refined these behaviors into more specific categories such as Speed, Acceleration, or Indentation. TUMD met the effectiveness criteria in three datasets, demonstrating its ability to describe and refine movement behaviors. The results confirmed our hypothesis, showing that the combination of movement taxonomies and anomaly detection successfully uncovers meaningful and interpretable patterns within high-dimensional, unlabeled movement data.
△ Less
Submitted 28 April, 2025;
originally announced April 2025.
-
Emulating Human Cognitive Processes for Expert-Level Medical Question-Answering with Large Language Models
Authors:
Khushboo Verma,
Marina Moore,
Stephanie Wottrich,
Karla Robles López,
Nishant Aggarwal,
Zeel Bhatt,
Aagamjit Singh,
Bradford Unroe,
Salah Basheer,
Nitish Sachdeva,
Prinka Arora,
Harmanjeet Kaur,
Tanupreet Kaur,
Tevon Hood,
Anahi Marquez,
Tushar Varshney,
Nanfu Deng,
Azaan Ramani,
Pawanraj Ishwara,
Maimoona Saeed,
Tatiana López Velarde Peña,
Bryan Barksdale,
Sushovan Guha,
Satwant Kumar
Abstract:
In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify…
▽ More
In response to the pressing need for advanced clinical problem-solving tools in healthcare, we introduce BooksMed, a novel framework based on a Large Language Model (LLM). BooksMed uniquely emulates human cognitive processes to deliver evidence-based and reliable responses, utilizing the GRADE (Grading of Recommendations, Assessment, Development, and Evaluations) framework to effectively quantify evidence strength. For clinical decision-making to be appropriately assessed, an evaluation metric that is clinically aligned and validated is required. As a solution, we present ExpertMedQA, a multispecialty clinical benchmark comprised of open-ended, expert-level clinical questions, and validated by a diverse group of medical professionals. By demanding an in-depth understanding and critical appraisal of up-to-date clinical literature, ExpertMedQA rigorously evaluates LLM performance. BooksMed outperforms existing state-of-the-art models Med-PaLM 2, Almanac, and ChatGPT in a variety of medical scenarios. Therefore, a framework that mimics human cognitive stages could be a useful tool for providing reliable and evidence-based responses to clinical inquiries.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Bring the BitCODE -- Moving Compute and Data in Distributed Heterogeneous Systems
Authors:
Wenbin Lu,
Luis E. Peña,
Pavel Shamis,
Valentin Churavy,
Barbara Chapman,
Steve Poole
Abstract:
In this paper, we present a framework for moving compute and data between processing elements in a distributed heterogeneous system. The implementation of the framework is based on the LLVM compiler toolchain combined with the UCX communication framework. The framework can generate binary machine code or LLVM bitcode for multiple CPU architectures and move the code to remote machines while dynamic…
▽ More
In this paper, we present a framework for moving compute and data between processing elements in a distributed heterogeneous system. The implementation of the framework is based on the LLVM compiler toolchain combined with the UCX communication framework. The framework can generate binary machine code or LLVM bitcode for multiple CPU architectures and move the code to remote machines while dynamically optimizing and linking the code on the target platform. The remotely injected code can recursively propagate itself to other remote machines or generate new code. The goal of this paper is threefold: (a) to present an architecture and implementation of the framework that provides essential infrastructure to program a new class of disaggregated systems wherein heterogeneous programming elements such as compute nodes and data processing units (DPUs) are distributed across the system, (b) to demonstrate how the framework can be integrated with modern, high-level programming languages such as Julia, and (c) to demonstrate and evaluate a new class of eXtended Remote Direct Memory Access (X-RDMA) communication operations that are enabled by this framework. To evaluate the capabilities of the framework, we used a cluster with Fujitsu CPUs and heterogeneous cluster with Intel CPUs and BlueField-2 DPUs interconnected using high-performance RDMA fabric. We demonstrated an X-RDMA pointer chase application that outperforms an RDMA GET-based implementation by 70% and is as fast as Active Messages, but does not require function predeployment on remote platforms.
△ Less
Submitted 1 August, 2022;
originally announced August 2022.
-
Mechanizing Matching Logic In Coq
Authors:
Péter Bereczky,
Xiaohong Chen,
Dániel Horpácsi,
Lucas Peña,
Jan Tušil
Abstract:
Matching logic is a formalism for specifying, and reasoning about, mathematical structures, using patterns and pattern matching. Growing in popularity, it has been used to define many logical systems such as separation logic with recursive definitions and linear temporal logic. In addition, it serves as the logical foundation of the K semantic framework, which was used to build practical verifiers…
▽ More
Matching logic is a formalism for specifying, and reasoning about, mathematical structures, using patterns and pattern matching. Growing in popularity, it has been used to define many logical systems such as separation logic with recursive definitions and linear temporal logic. In addition, it serves as the logical foundation of the K semantic framework, which was used to build practical verifiers for a number of real-world languages. Despite being a fundamental formal system accommodating substantial theories, matching logic lacks a general-purpose, machine-checked formalization. Hence, we formalize matching logic using the Coq proof assistant. Specifically, we create a new representation of matching logic that uses a locally nameless encoding, and we formalize the syntax, semantics, and proof system of this representation in the Coq proof assistant. Crucially, we prove the soundness of the formalized proof system and provide a means to carry out interactive matching logic reasoning in Coq. We believe this work provides a previously unexplored avenue for reasoning about matching logic, its models, and the proof system.
△ Less
Submitted 21 September, 2022; v1 submitted 14 January, 2022;
originally announced January 2022.
-
UCX Programming Interface for Remote Function Injection and Invocation
Authors:
Luis E. Peña,
Wenbin Lu,
Pavel Shamis,
Steve Poole
Abstract:
Network library APIs have historically been developed with the emphasis on data movement, placement, and communication semantics. Many communication semantics are available across a large variety of network libraries, such as send-receive, data streaming, put/get/atomic, RPC, active messages, collective communication, etc. In this work we introduce new compute and data movement APIs that overcome…
▽ More
Network library APIs have historically been developed with the emphasis on data movement, placement, and communication semantics. Many communication semantics are available across a large variety of network libraries, such as send-receive, data streaming, put/get/atomic, RPC, active messages, collective communication, etc. In this work we introduce new compute and data movement APIs that overcome the constraints of the single-program, multiple-data (SPMD) programming model by allowing users to send binary executable code between processing elements. Our proof-of-concept implementation of the API is based on the UCX communication framework and leverages the RDMA network for fast compute migration. We envision the API being used to dispatch user functions from a host CPU to a SmartNIC (DPU), computational storage drive (CSD), or remote servers. In addition, the API can be used by large-scale irregular applications (such as semantic graph analysis), composed of many coordinating tasks operating on a data set so big that it has to be stored on many physical devices. In such cases, it may be more efficient to dynamically choose where code runs as the applications progresses.
△ Less
Submitted 12 October, 2021;
originally announced October 2021.
-
Two-Chains: High Performance Framework for Function Injection and Execution
Authors:
Megan Grodowitz,
Luis E. Peña,
Curtis Dunham,
Dong Zhong,
Pavel Shamis,
Steve Poole
Abstract:
Some important problems, such as semantic graph analysis, require large-scale irregular applications composed of many coordinating tasks that operate on a shared data set so big it has to be stored on many physical devices. In these cases, it may be more efficient to dynamically choose where code runs as the applications progresses. Many programming environments provide task migration or remote fu…
▽ More
Some important problems, such as semantic graph analysis, require large-scale irregular applications composed of many coordinating tasks that operate on a shared data set so big it has to be stored on many physical devices. In these cases, it may be more efficient to dynamically choose where code runs as the applications progresses. Many programming environments provide task migration or remote function calls, but they have sharp trade-offs between flexible composition, portability, performance, and code complexity.
We developed Two-Chains, a high performance framework inspired by active message communication semantics. We use the GNU Binutils, the ELF binary format, and the RDMA network protocol to provide ultra-low granularity distributed function composition at runtime in user space at HPC performance levels using C libraries. Our framework allows the direct injection of function binaries and data to a remote machine cache using the RDMA network. It interoperates seamlessly with existing C libraries using standard dynamic linking and load symbol resolution. We analyze function delivery and execution on cache stashing-enabled hardware and show that stashing decreases latency, increases message rates, and improves noise tolerance. This demonstrates one way this method is suited to increasingly network-oriented hardware architectures.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
Model-Guided Synthesis of Inductive Lemmas for FOL with Least Fixpoints
Authors:
Adithya Murali,
Lucas Peña,
Eion Blanchard,
Christof Löding,
P. Madhusudan
Abstract:
Recursively defined linked data structures embedded in a pointer-based heap and their properties are naturally expressed in pure first-order logic with least fixpoint definitions (FO+lfp) with background theories. Such logics, unlike pure first-order logic, do not admit even complete procedures. In this paper, we undertake a novel approach for synthesizing inductive hypotheses to prove validity in…
▽ More
Recursively defined linked data structures embedded in a pointer-based heap and their properties are naturally expressed in pure first-order logic with least fixpoint definitions (FO+lfp) with background theories. Such logics, unlike pure first-order logic, do not admit even complete procedures. In this paper, we undertake a novel approach for synthesizing inductive hypotheses to prove validity in this logic. The idea is to utilize several kinds of finite first-order models as counterexamples that capture the non-provability and invalidity of formulas to guide the search for inductive hypotheses. We implement our procedures and evaluate them extensively over theorems involving heap data structures that require inductive proofs and demonstrate the effectiveness of our methodology.
△ Less
Submitted 26 September, 2022; v1 submitted 21 September, 2020;
originally announced September 2020.
-
Los perfiles de investigación y su implantación en la Universidad Publica de Navarra
Authors:
Manuel Ruiz de Luzuriaga Peña,
Isabel Muñoz Mouriño,
Mercedes Bogino Larrambebere
Abstract:
This work aims to monitor and control the presence of UPNA research staff in the main research profiles platforms, not only in the most obvious ones such as Google Scholar Citation, Researcher ID, Scopus ID and ORCID, but also in other services that, in practice, they function as research profiles, such as Mendeley, Linkedin, ResearchGate, Academia.edu and Academica-e. We also find it interesting…
▽ More
This work aims to monitor and control the presence of UPNA research staff in the main research profiles platforms, not only in the most obvious ones such as Google Scholar Citation, Researcher ID, Scopus ID and ORCID, but also in other services that, in practice, they function as research profiles, such as Mendeley, Linkedin, ResearchGate, Academia.edu and Academica-e. We also find it interesting to analyze that presence and see how it responds to a variables, such as the department, gender, job category, research group. In this study we have excluded some platforms for different reasons. Dialnet profiles are entered from the UPNA library (BUPNA), which means that all those who meet the requirements for inclusion would be there, so their analysis does not make much sense, since it depends on factors outside the will of the researcher himself. The same is the case with the UPNA Scientific Production Portal (PPC): the data is entered from the Vicerrectorado de Investigación and should include all members of the UPNA PDI. Using as a base the census of university research staff provided by the Vicerrectorado de Investigación, it has been verified, for each author, the existence or not of a profile in the different services studied. The results have been tabulated in an Excel file to be able to analyze them later. The data has been collected in March 2018 for Orcid, ResearcherID, ScopusID, Google Scholar Citations and Mendeley. In November 2018, data from Academica-e, Academia.edu, ResearchGate and Linkedin were taken. For each of the profiles, a search by institutional affiliation was used, when possible, to obtain a first list of UPNA research personnel with that profile. Subsequently, a search was carried out, person by person, of the rest of the research staff that did not appear in that first list.
△ Less
Submitted 29 May, 2020;
originally announced May 2020.
-
Towards a Verified Model of the Algorand Consensus Protocol in Coq
Authors:
Musab A. Alturki,
Jing Chen,
Victor Luchangco,
Brandon Moore,
Karl Palmskog,
Lucas Peña,
Grigore Roşu
Abstract:
The Algorand blockchain is a secure and decentralized public ledger based on pure proof of stake rather than proof of work. At its core it is a novel consensus protocol with exactly one block certified in each round: that is, the protocol guarantees that the blockchain does not fork. In this paper, we report on our effort to model and formally verify the Algorand consensus protocol in the Coq proo…
▽ More
The Algorand blockchain is a secure and decentralized public ledger based on pure proof of stake rather than proof of work. At its core it is a novel consensus protocol with exactly one block certified in each round: that is, the protocol guarantees that the blockchain does not fork. In this paper, we report on our effort to model and formally verify the Algorand consensus protocol in the Coq proof assistant. Similar to previous consensus protocol verification efforts, we model the protocol as a state transition system and reason over reachable global states. However, in contrast to previous work, our model explicitly incorporates timing issues (e.g., timeouts and network delays) and adversarial actions, reflecting a more realistic environment faced by a public blockchain. Thus far, we have proved asynchronous safety of the protocol: two different blocks cannot be certified in the same round, even when the adversary has complete control of message delivery in the network. We believe that our model is sufficiently general and other relevant properties of the protocol such as liveness can be proved for the same model.
△ Less
Submitted 25 August, 2020; v1 submitted 11 July, 2019;
originally announced July 2019.
-
A First-Order Logic with Frames
Authors:
Adithya Murali,
Lucas Peña,
Christof Löding,
P. Madhusudan
Abstract:
We propose a novel logic, called Frame Logic (FL), that extends first-order logic (with recursive definitions) using a construct Sp(.) that captures the implicit supports of formulas -- the precise subset of the universe upon which their meaning depends. Using such supports, we formulate proof rules that facilitate frame reasoning elegantly when the underlying model undergoes change. We show that…
▽ More
We propose a novel logic, called Frame Logic (FL), that extends first-order logic (with recursive definitions) using a construct Sp(.) that captures the implicit supports of formulas -- the precise subset of the universe upon which their meaning depends. Using such supports, we formulate proof rules that facilitate frame reasoning elegantly when the underlying model undergoes change. We show that the logic is expressive by capturing several data-structures and also exhibit a translation from a precise fragment of separation logic to frame logic. Finally, we design a program logic based on frame logic for reasoning with programs that dynamically update heaps that facilitates local specifications and frame reasoning. This program logic consists of both localized proof rules as well as rules that derive the weakest tightest preconditions in FL.
△ Less
Submitted 26 September, 2022; v1 submitted 25 January, 2019;
originally announced January 2019.
-
Procedural Generation of Videos to Train Deep Action Recognition Networks
Authors:
César Roberto de Souza,
Adrien Gaidon,
Yohann Cabon,
Antonio Manuel López Peña
Abstract:
Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametri…
▽ More
Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric generative model of human action videos that relies on procedural generation and other computer graphics techniques of modern game engines. We generate a diverse, realistic, and physically plausible dataset of human action videos, called PHAV for "Procedural Human Action Videos". It contains a total of 39,982 videos, with more than 1,000 examples for each action of 35 categories. Our approach is not limited to existing motion capture sequences, and we procedurally define 14 synthetic actions. We introduce a deep multi-task representation learning architecture to mix synthetic and real videos, even if the action categories differ. Our experiments on the UCF101 and HMDB51 benchmarks suggest that combining our large set of synthetic videos with small real-world datasets can boost recognition performance, significantly outperforming fine-tuning state-of-the-art unsupervised generative models of videos.
△ Less
Submitted 19 July, 2017; v1 submitted 2 December, 2016;
originally announced December 2016.
-
Who can replace Xavi? A passing motif analysis of football players
Authors:
Javier López Peña,
Raúl Sánchez Navarro
Abstract:
Traditionally, most of football statistical and media coverage has been focused almost exclusively on goals and (ocassionally) shots. However, most of the duration of a football game is spent away from the boxes, passing the ball around. The way teams pass the ball around is the most characteristic measurement of what a team's "unique style" is.
In the present work we analyse passing sequences a…
▽ More
Traditionally, most of football statistical and media coverage has been focused almost exclusively on goals and (ocassionally) shots. However, most of the duration of a football game is spent away from the boxes, passing the ball around. The way teams pass the ball around is the most characteristic measurement of what a team's "unique style" is.
In the present work we analyse passing sequences at the player level, using the different passing frequencies as a "digital fingerprint" of a player's style. The resulting numbers provide an adequate feature set which can be used in order to construct a measure of similarity between players. Armed with such a similarity tool, one can try to answer the question: Who might possibly replace Xavi at FC Barcelona?
△ Less
Submitted 23 June, 2015;
originally announced June 2015.