-
Inverse matroid optimization under subset constraints
Authors:
Kristóf Bérczi,
Lydia Mirabel Mendoza-Cadena,
José Soto
Abstract:
In the Inverse Matroid problem, we are given a matroid, a fixed basis $B$, and an initial weight function, and the goal is to minimally modify the weights -- measured by some function -- so that $B$ becomes a maximum-weight basis. The problem arises naturally in settings where one wishes to explain or enforce a given solution by minimally perturbing the input.
We extend this classical problem by…
▽ More
In the Inverse Matroid problem, we are given a matroid, a fixed basis $B$, and an initial weight function, and the goal is to minimally modify the weights -- measured by some function -- so that $B$ becomes a maximum-weight basis. The problem arises naturally in settings where one wishes to explain or enforce a given solution by minimally perturbing the input.
We extend this classical problem by replacing the fixed basis with a subset $S_0$ of the ground set and imposing various structural constraints on the set of maximum-weight bases relative to $S_0$. Specifically, we study six variants: (A) Inverse Matroid Exists, where $S_0$ must contain at least one maximum-weight basis; (B) Inverse Matroid All, where all bases contained in $S_0$ are maximum-weight; and (C) Inverse Matroid Only, where $S_0$ contains exactly the maximum-weight bases, along with their natural negated counterparts.
For all variants, we develop combinatorial polynomial-time algorithms under the $\ell_\infty$-norm. A key ingredient is a refined min-max theorem for Inverse Matroid under the $\ell_\infty$-norm, which enables simpler and faster algorithms than previous approaches and may be of independent combinatorial interest. Our work significantly broadens the range of inverse optimization problems on matroids that can be solved efficiently, especially those that constrain the structure of optimal solutions through subset inclusion or exclusion.
△ Less
Submitted 2 July, 2025; v1 submitted 1 July, 2025;
originally announced July 2025.
-
AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models
Authors:
Miguel Angel Peñaloza Perez,
Bruno Lopez Orozco,
Jesus Tadeo Cruz Soto,
Michelle Bruno Hernandez,
Miguel Angel Alvarado Gonzalez,
Sandra Malagon
Abstract:
Existing mathematical reasoning benchmarks are predominantly English only or translation-based, which can introduce semantic drift and mask languagespecific reasoning errors. To address this, we present AI4Math, a benchmark of 105 original university level math problems natively authored in Spanish. The dataset spans seven advanced domains (Algebra, Calculus, Geometry, Probability, Number Theory,…
▽ More
Existing mathematical reasoning benchmarks are predominantly English only or translation-based, which can introduce semantic drift and mask languagespecific reasoning errors. To address this, we present AI4Math, a benchmark of 105 original university level math problems natively authored in Spanish. The dataset spans seven advanced domains (Algebra, Calculus, Geometry, Probability, Number Theory, Combinatorics, and Logic), and each problem is accompanied by a step by step human solution. We evaluate six large language models GPT 4o, GPT 4o mini, o3 mini, LLaMA 3.3 70B, DeepSeek R1 685B, and DeepSeek V3 685B under four configurations: zero shot and chain of thought, each in Spanish and English. The top models (o3 mini, DeepSeek R1 685B, DeepSeek V3 685B) achieve over 70% accuracy, whereas LLaMA 3.3 70B and GPT-4o mini remain below 40%. Most models show no significant performance drop between languages, with GPT 4o even performing better on Spanish problems in the zero shot setting. Geometry, Combinatorics, and Probability questions remain persistently challenging for all models. These results highlight the need for native-language benchmarks and domain-specific evaluations to reveal reasoning failures not captured by standard metrics.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction
Authors:
Ian Noronha,
Advait Prasad Jawaji,
Juan Camilo Soto,
Jiajun An,
Yan Gu,
Upinder Kaur
Abstract:
Animal-robot interaction (ARI) remains an unexplored challenge in robotics, as robots struggle to interpret the complex, multimodal communication cues of animals, such as body language, movement, and vocalizations. Unlike human-robot interaction, which benefits from established datasets and frameworks, animal-robot interaction lacks the foundational resources needed to facilitate meaningful bidire…
▽ More
Animal-robot interaction (ARI) remains an unexplored challenge in robotics, as robots struggle to interpret the complex, multimodal communication cues of animals, such as body language, movement, and vocalizations. Unlike human-robot interaction, which benefits from established datasets and frameworks, animal-robot interaction lacks the foundational resources needed to facilitate meaningful bidirectional communication. To bridge this gap, we present the MBE-ARI (Multimodal Bidirectional Engagement in Animal-Robot Interaction), a novel multimodal dataset that captures detailed interactions between a legged robot and cows. The dataset includes synchronized RGB-D streams from multiple viewpoints, annotated with body pose and activity labels across interaction phases, offering an unprecedented level of detail for ARI research. Additionally, we introduce a full-body pose estimation model tailored for quadruped animals, capable of tracking 39 keypoints with a mean average precision (mAP) of 92.7%, outperforming existing benchmarks in animal pose estimation. The MBE-ARI dataset and our pose estimation framework lay a robust foundation for advancing research in animal-robot interaction, providing essential tools for developing perception, reasoning, and interaction frameworks needed for effective collaboration between robots and animals. The dataset and resources are publicly available at https://github.com/RISELabPurdue/MBE-ARI/, inviting further exploration and development in this critical area.
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Matroid Secretary via Labeling Schemes
Authors:
Kristóf Bérczi,
Vasilis Livanos,
José Soto,
Victor Verdugo
Abstract:
The Matroid Secretary Problem (MSP) is one of the most prominent settings for online resource allocation and optimal stopping. A decision-maker is presented with a ground set of elements $E$ revealed sequentially and in random order. Upon arrival, an irrevocable decision is made in a take-it-or-leave-it fashion, subject to a feasibility constraint on the set of selected elements captured by a matr…
▽ More
The Matroid Secretary Problem (MSP) is one of the most prominent settings for online resource allocation and optimal stopping. A decision-maker is presented with a ground set of elements $E$ revealed sequentially and in random order. Upon arrival, an irrevocable decision is made in a take-it-or-leave-it fashion, subject to a feasibility constraint on the set of selected elements captured by a matroid defined over $E$. The decision-maker only has ordinal access to compare the elements, and the goal is to design an algorithm that selects every element of the optimal basis with probability at least $α$ (i.e., $α$-probability-competitive). While the existence of a constant probability-competitive algorithm for MSP remains a major open question, simple greedy policies are at the core of state-of-the-art algorithms for several matroid classes.
We introduce a flexible and general algorithmic framework to analyze greedy-like algorithms for MSP based on constructing a language associated with the matroid. Using this language, we establish a lower bound on the probability-competitiveness of the algorithm by studying a corresponding Poisson point process that governs the words' distribution in the language. Using our framework, we break the state-of-the-art guarantee for laminar matroids by settling the probability-competitiveness of the greedy-improving algorithm to be exactly $1-\ln(2) \approx 0.3068$. We also showcase the capabilities of our framework in graphic matroids, to show a probability-competitiveness of $0.2693$ for simple graphs and $0.2504$ for general graphs.
△ Less
Submitted 30 May, 2025; v1 submitted 18 November, 2024;
originally announced November 2024.
-
Prophet Upper Bounds for Online Matching and Auctions
Authors:
José Soto,
Victor Verdugo
Abstract:
In the online 2-bounded auction problem, we have a collection of items represented as nodes in a graph and bundles of size two represented by edges. Agents are presented sequentially, each with a random weight function over the bundles. The goal of the decision-maker is to find an allocation of bundles to agents of maximum weight so that every item is assigned at most once, i.e., the solution is a…
▽ More
In the online 2-bounded auction problem, we have a collection of items represented as nodes in a graph and bundles of size two represented by edges. Agents are presented sequentially, each with a random weight function over the bundles. The goal of the decision-maker is to find an allocation of bundles to agents of maximum weight so that every item is assigned at most once, i.e., the solution is a matching in the graph. When the agents are single-minded (i.e., put all the weight in a single bundle), we recover the maximum weight prophet matching problem under edge arrivals (a.k.a. prophet matching).
In this work, we provide new and improved upper bounds on the competitiveness achievable by an algorithm for the general online 2-bounded auction and the (single-minded) prophet matching problems. For adversarial arrival order of the agents, we show that no algorithm for the online 2-bounded auction problem achieves a competitiveness larger than $4/11$, while no algorithm for prophet matching achieves a competitiveness larger than $\approx 0.4189$. Using a continuous-time analysis, we also improve the known bounds for online 2-bounded auctions for random order arrivals to $\approx 0.5968$ in the general case, a bound of $\approx 0.6867$ in the IID model, and $\approx 0.6714$ in prophet-secretary model.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Sports center customer segmentation: a case study
Authors:
Juan Soto,
Ramón Carmenaty,
Miguel Lastra,
Juan M. Fernández-Luna,
José M. Benítez
Abstract:
Customer segmentation is a fundamental process to develop effective marketing strategies, personalize customer experience and boost their retention and loyalty. This problem has been widely addressed in the scientific literature, yet no definitive solution for every case is available. A specific case study characterized by several individualizing features is thoroughly analyzed and discussed in th…
▽ More
Customer segmentation is a fundamental process to develop effective marketing strategies, personalize customer experience and boost their retention and loyalty. This problem has been widely addressed in the scientific literature, yet no definitive solution for every case is available. A specific case study characterized by several individualizing features is thoroughly analyzed and discussed in this paper. Because of the case properties a robust and innovative approach to both data handling and analytical processes is required. The study led to a sound proposal for customer segmentation. The highlights of the proposal include a convenient data partition to decompose the problem, an adaptive distance function definition and its optimization through genetic algorithms. These comprehensive data handling strategies not only enhance the dataset reliability for segmentation analysis but also support the operational efficiency and marketing strategies of sports centers, ultimately improving the customer experience.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Set Selection with Uncertain Weights: Non-Adaptive Queries and Thresholds
Authors:
Christoph Dürr,
Arturo Merino,
José A. Soto,
José Verschae
Abstract:
We study set selection problems where the weights are uncertain. Instead of its exact weight, only an uncertainty interval containing its true weight is available for each element. In some cases, some solutions are universally optimal; i.e., they are optimal for every weight that lies within the uncertainty intervals. However, it may be that no universal optimal solution exists, unless we are reve…
▽ More
We study set selection problems where the weights are uncertain. Instead of its exact weight, only an uncertainty interval containing its true weight is available for each element. In some cases, some solutions are universally optimal; i.e., they are optimal for every weight that lies within the uncertainty intervals. However, it may be that no universal optimal solution exists, unless we are revealed additional information on the precise values of some elements.
In the minimum cost admissible query problem, we are tasked to (non-adaptively) find a minimum-cost subset of elements that, no matter how they are revealed, guarantee the existence of a universally optimal solution.
We introduce thresholds under uncertainty to analyze problems of minimum cost admissible queries. Roughly speaking, for every element e, there is a threshold for its weight, below which e is included in all optimal solutions and a second threshold above which e is excluded from all optimal solutions.
We show that computing thresholds and finding minimum cost admissible queries are essentially equivalent problems. Thus, the analysis of the minimum admissible query problem reduces to the problem of computing thresholds.
We provide efficient algorithms for computing thresholds in the settings of minimum spanning trees, matroids, and matchings in trees; and NP-hardness results in the settings of s-t shortest paths and bipartite matching. By making use of the equivalence between the two problems these results translate into efficient algorithms for minimum cost admissible queries in the settings of minimum spanning trees, matroids, and matchings in trees; and NP-hardness results in the settings of s-t shortest paths and bipartite matching.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Online Combinatorial Assignment in Independence Systems
Authors:
Javier Marinkovic,
José A. Soto,
Victor Verdugo
Abstract:
We consider an online multi-weighted generalization of several classic online optimization problems, called the online combinatorial assignment problem. We are given an independence system over a ground set of elements and agents that arrive online one by one. Upon arrival, each agent reveals a weight function over the elements of the ground set. If the independence system is given by the matching…
▽ More
We consider an online multi-weighted generalization of several classic online optimization problems, called the online combinatorial assignment problem. We are given an independence system over a ground set of elements and agents that arrive online one by one. Upon arrival, each agent reveals a weight function over the elements of the ground set. If the independence system is given by the matchings of a hypergraph we recover the combinatorial auction problem, where every node represents an item to be sold, and every edge represents a bundle of items. For combinatorial auctions, Kesselheim et al. showed upper bounds of O(loglog(k)/log(k)) and $O(\log \log(n)/\log(n))$ on the competitiveness of any online algorithm, even in the random order model, where $k$ is the maximum bundle size and $n$ is the number of items. We provide an exponential improvement on these upper bounds to show that the competitiveness of any online algorithm in the prophet IID setting is upper bounded by $O(\log(k)/k)$, and $O(\log(n)/\sqrt{n})$. Furthermore, using linear programming, we provide new and improved guarantees for the $k$-bounded online combinatorial auction problem (i.e., bundles of size at most $k$). We show a $(1-e^{-k})/k$-competitive algorithm in the prophet IID model, a $1/(k+1)$-competitive algorithm in the prophet-secretary model using a single sample per agent, and a $k^{-k/(k-1)}$-competitive algorithm in the secretary model. Our algorithms run in polynomial time and work in more general independence systems where the offline combinatorial assignment problem admits the existence of a polynomial-time randomized algorithm that we call certificate sampler. We show that certificate samplers have a nice interplay with random order models, and we also provide new polynomial-time competitive algorithms for some classes of matroids, matroid intersections, and matchoids.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
QuOTeS: Query-Oriented Technical Summarization
Authors:
Juan Ramirez-Orta,
Eduardo Xamena,
Ana Maguitman,
Axel J. Soto,
Flavia P. Zanoto,
Evangelos Milios
Abstract:
Abstract. When writing an academic paper, researchers often spend considerable time reviewing and summarizing papers to extract relevant citations and data to compose the Introduction and Related Work sections. To address this problem, we propose QuOTeS, an interactive system designed to retrieve sentences related to a summary of the research from a collection of potential references and hence ass…
▽ More
Abstract. When writing an academic paper, researchers often spend considerable time reviewing and summarizing papers to extract relevant citations and data to compose the Introduction and Related Work sections. To address this problem, we propose QuOTeS, an interactive system designed to retrieve sentences related to a summary of the research from a collection of potential references and hence assist in the composition of new papers. QuOTeS integrates techniques from Query-Focused Extractive Summarization and High-Recall Information Retrieval to provide Interactive Query-Focused Summarization of scientific documents. To measure the performance of our system, we carried out a comprehensive user study where participants uploaded papers related to their research and evaluated the system in terms of its usability and the quality of the summaries it produces. The results show that QuOTeS provides a positive user experience and consistently provides query-focused summaries that are relevant, concise, and complete. We share the code of our system and the novel Query-Focused Summarization dataset collected during our experiments at https://github.com/jarobyte91/quotes.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
A Survey on Transactional Stream Processing
Authors:
Shuhao Zhang,
Juan Soto,
Volker Markl
Abstract:
Transactional stream processing (TSP) strives to create a cohesive model that merges the advantages of both transactional and stream-oriented guarantees. Over the past decade, numerous endeavors have contributed to the evolution of TSP solutions, uncovering similarities and distinctions among them. Despite these advances, a universally accepted standard approach for integrating transactional funct…
▽ More
Transactional stream processing (TSP) strives to create a cohesive model that merges the advantages of both transactional and stream-oriented guarantees. Over the past decade, numerous endeavors have contributed to the evolution of TSP solutions, uncovering similarities and distinctions among them. Despite these advances, a universally accepted standard approach for integrating transactional functionality with stream processing remains to be established. Existing TSP solutions predominantly concentrate on specific application characteristics and involve complex design trade-offs. This survey intends to introduce TSP and present our perspective on its future progression. Our primary goals are twofold: to provide insights into the diverse TSP requirements and methodologies, and to inspire the design and development of groundbreaking TSP systems.
△ Less
Submitted 31 May, 2023; v1 submitted 21 August, 2022;
originally announced August 2022.
-
Approximation Algorithms for Vertex-Connectivity Augmentation on the Cycle
Authors:
Waldo Gálvez,
Francisco Sanhueza-Matamala,
José A. Soto
Abstract:
Given a $k$-vertex-connected graph $G$ and a set $S$ of extra edges (links), the goal of the $k$-vertex-connectivity augmentation problem is to find a set $S' \subseteq S$ of minimum size such that adding $S'$ to $G$ makes it $(k+1)$-vertex-connected. Unlike the edge-connectivity augmentation problem, research for the vertex-connectivity version has been sparse.
In this work we present the first…
▽ More
Given a $k$-vertex-connected graph $G$ and a set $S$ of extra edges (links), the goal of the $k$-vertex-connectivity augmentation problem is to find a set $S' \subseteq S$ of minimum size such that adding $S'$ to $G$ makes it $(k+1)$-vertex-connected. Unlike the edge-connectivity augmentation problem, research for the vertex-connectivity version has been sparse.
In this work we present the first polynomial time approximation algorithm that improves the known ratio of 2 for $2$-vertex-connectivity augmentation, for the case in which $G$ is a cycle. This is the first step for attacking the more general problem of augmenting a $2$-connected graph.
Our algorithm is based on local search and attains an approximation ratio of $1.8704$. To derive it, we prove novel results on the structure of minimal solutions.
△ Less
Submitted 3 November, 2021;
originally announced November 2021.
-
Post-OCR Document Correction with large Ensembles of Character Sequence-to-Sequence Models
Authors:
Juan Ramirez-Orta,
Eduardo Xamena,
Ana Maguitman,
Evangelos Milios,
Axel J. Soto
Abstract:
In this paper, we propose a novel method based on character sequence-to-sequence models to correct documents already processed with Optical Character Recognition (OCR) systems. The main contribution of this paper is a set of strategies to accurately process strings much longer than the ones used to train the sequence model while being sample- and resource-efficient, supported by thorough experimen…
▽ More
In this paper, we propose a novel method based on character sequence-to-sequence models to correct documents already processed with Optical Character Recognition (OCR) systems. The main contribution of this paper is a set of strategies to accurately process strings much longer than the ones used to train the sequence model while being sample- and resource-efficient, supported by thorough experimentation. The strategy with the best performance involves splitting the input document in character n-grams and combining their individual corrections into the final output using a voting scheme that is equivalent to an ensemble of a large number of sequence models. We further investigate how to weigh the contributions from each one of the members of this ensemble. We test our method on nine languages of the ICDAR 2019 competition on post-OCR text correction and achieve a new state-of-the-art performance in five of them. Our code for post-OCR correction is shared at https://github.com/jarobyte91/post_ocr_correction.
△ Less
Submitted 24 January, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.
-
Using Molecular Embeddings in QSAR Modeling: Does it Make a Difference?
Authors:
María Virginia Sabando,
Ignacio Ponzoni,
Evangelos E. Milios,
Axel J. Soto
Abstract:
With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinde…
▽ More
With the consolidation of deep learning in drug discovery, several novel algorithms for learning molecular representations have been proposed. Despite the interest of the community in developing new methods for learning molecular embeddings and their theoretical benefits, comparing molecular embeddings with each other and with traditional representations is not straightforward, which in turn hinders the process of choosing a suitable representation for QSAR modeling. A reason behind this issue is the difficulty of conducting a fair and thorough comparison of the different existing embedding approaches, which requires numerous experiments on various datasets and training scenarios. To close this gap, we reviewed the literature on methods for molecular embeddings and reproduced three unsupervised and two supervised molecular embedding techniques recently proposed in the literature. We compared these five methods concerning their performance in QSAR scenarios using different classification and regression datasets. We also compared these representations to traditional molecular representations, namely molecular descriptors and fingerprints. As opposed to the expected outcome, our experimental setup consisting of over 25,000 trained models and statistical tests revealed that the predictive performance using molecular embeddings did not significantly surpass that of traditional representations. While supervised embeddings yielded competitive results compared to those using traditional molecular representations, unsupervised embeddings tended to perform worse than traditional representations. Our results highlight the need for conducting a careful comparison and analysis of the different embedding techniques prior to using them in drug design tasks, and motivate a discussion about the potential of molecular embeddings in computer-aided drug design.
△ Less
Submitted 28 July, 2021; v1 submitted 20 March, 2021;
originally announced April 2021.
-
Sample-driven optimal stopping: From the secretary problem to the i.i.d. prophet inequality
Authors:
José Correa,
Andrés Cristi,
Boris Epstein,
José Soto
Abstract:
We take a unifying approach to single selection optimal stopping problems with random arrival order and independent sampling of items. In the problem we consider, a decision maker (DM) initially gets to sample each of $N$ items independently with probability $p$, and can observe the relative rankings of these sampled items. Then, the DM faces the remaining items in an online fashion, observing the…
▽ More
We take a unifying approach to single selection optimal stopping problems with random arrival order and independent sampling of items. In the problem we consider, a decision maker (DM) initially gets to sample each of $N$ items independently with probability $p$, and can observe the relative rankings of these sampled items. Then, the DM faces the remaining items in an online fashion, observing the relative rankings of all revealed items. While scanning the sequence the DM makes irrevocable stop/continue decisions and her reward for stopping the sequence facing the item with rank $i$ is $Y_i$. The goal of the DM is to maximize her reward. We start by studying the case in which the values $Y_i$ are known to the DM, and then move to the case in which these values are adversarial.
For the former case, we write the natural linear program that captures the performance of an algorithm, and take its continuous limit. We prove a structural result about this continuous limit, which allows us to reduce the problem to a relatively simple real optimization problem. We establish that the optimal algorithm is given by a sequence of thresholds $t_1\le t_2\le\cdots$ such that the DM should stop if seeing an item with current ranking $i$ after time $t_i$. Additionally we are able to recover several classic results in the area such as those for secretary problem and the minimum ranking problem. For the adversarial case, we obtain a similar linear program with an additional stochastic dominance constraint. Using the same machinery we are able to pin down the optimal competitive ratios for all values of $p$. Notably, we prove that as $p$ approaches 1, our guarantee converges linearly to 0.745, matching that of the i.i.d.~prophet inequality. Also interesting is the case $p=1/2$, where our bound evaluates to $0.671$, which improves upon the state of the art.
△ Less
Submitted 9 August, 2021; v1 submitted 12 November, 2020;
originally announced November 2020.
-
ChemVA: Interactive Visual Analysis of Chemical Compound Similarity in Virtual Screening
Authors:
María Virginia Sabando,
Pavol Ulbrich,
Matías Selzer,
Jan Byška,
Jan Mičan,
Ignacio Ponzoni,
Axel J. Soto,
María Luján Ganuza,
Barbora Kozlíková
Abstract:
In the modern drug discovery process, medicinal chemists deal with the complexity of analysis of large ensembles of candidate molecules. Computational tools, such as dimensionality reduction (DR) and classification, are commonly used to efficiently process the multidimensional space of features. These underlying calculations often hinder interpretability of results and prevent experts from assessi…
▽ More
In the modern drug discovery process, medicinal chemists deal with the complexity of analysis of large ensembles of candidate molecules. Computational tools, such as dimensionality reduction (DR) and classification, are commonly used to efficiently process the multidimensional space of features. These underlying calculations often hinder interpretability of results and prevent experts from assessing the impact of individual molecular features on the resulting representations. To provide a solution for scrutinizing such complex data, we introduce ChemVA, an interactive application for the visual exploration of large molecular ensembles and their features. Our tool consists of multiple coordinated views: Hexagonal view, Detail view, 3D view, Table view, and a newly proposed Difference view designed for the comparison of DR projections. These views display DR projections combined with biological activity, selected molecular features, and confidence scores for each of these projections. This conjunction of views allows the user to drill down through the dataset and to efficiently select candidate compounds. Our approach was evaluated on two case studies of finding structurally similar ligands with similar binding affinity to a target protein, as well as on an external qualitative evaluation. The results suggest that our system allows effective visual inspection and comparison of different high-dimensional molecular representations. Furthermore, ChemVA assists in the identification of candidate compounds while providing information on the certainty behind different molecular representations.
△ Less
Submitted 30 August, 2020;
originally announced August 2020.
-
DeepBeat: A multi-task deep learning approach to assess signal quality and arrhythmia detection in wearable devices
Authors:
Jessica Torres Soto,
Euan Ashley
Abstract:
Wearable devices enable theoretically continuous, longitudinal monitoring of physiological measurements like step count, energy expenditure, and heart rate. Although the classification of abnormal cardiac rhythms such as atrial fibrillation from wearable devices has great potential, commercial algorithms remain proprietary and tend to focus on heart rate variability derived from green spectrum LED…
▽ More
Wearable devices enable theoretically continuous, longitudinal monitoring of physiological measurements like step count, energy expenditure, and heart rate. Although the classification of abnormal cardiac rhythms such as atrial fibrillation from wearable devices has great potential, commercial algorithms remain proprietary and tend to focus on heart rate variability derived from green spectrum LED sensors placed on the wrist where noise remains an unsolved problem. Here, we develop a multi-task deep learning method to assess signal quality and arrhythmia event detection in wearable photoplethysmography devices for real-time detection of atrial fibrillation (AF). We train our algorithm on over one million simulated unlabeled physiological signals and fine-tune on a curated dataset of over 500K labeled signals from over 100 individuals from 3 different wearable devices. We demonstrate that in comparison with a traditional random forest-based approach (precision:0.24, recall:0.58, f1:0.34, auPRC:0.44) and a single task CNN (precision:0.59, recall:0.69, f1:0.64, auPRC:0.68) our architecture using unsupervised transfer learning through convolutional denoising autoencoders dramatically improves the performance of AF detection in participants at rest (pr:0.94, rc:0.98, f1:0.96, auPRC:0.96). In addition, we validate algorithm performance on a prospectively derived replication cohort of ambulatory subjects using data derived from an independently engineered device. We show that two-stage training can help address the unbalanced data problem common to biomedical applications where large well-annotated datasets are scarce. In conclusion, though a combination of simulation and transfer learning and we develop and apply a multitask architecture to the problem of AF detection from wearable wrist sensors demonstrating high levels of accuracy and a solution for the vexing challenge of mechanical noise.
△ Less
Submitted 25 January, 2020; v1 submitted 1 January, 2020;
originally announced January 2020.
-
The Two-Sided Game of Googol and Sample-Based Prophet Inequalities
Authors:
José Correa,
Andrés Cristi,
Boris Epstein,
José A. Soto
Abstract:
The secretary problem or the game of Googol are classic models for online selection problems that have received significant attention in the last five decades. We consider a variant of the problem and explore its connections to data-driven online selection. Specifically, we are given $n$ cards with arbitrary non-negative numbers written on both sides. The cards are randomly placed on $n$ consecuti…
▽ More
The secretary problem or the game of Googol are classic models for online selection problems that have received significant attention in the last five decades. We consider a variant of the problem and explore its connections to data-driven online selection. Specifically, we are given $n$ cards with arbitrary non-negative numbers written on both sides. The cards are randomly placed on $n$ consecutive positions on a table, and for each card, the visible side is also selected at random. The player sees the visible side of all cards and wants to select the card with the maximum hidden value. To this end, the player flips the first card, sees its hidden value and decides whether to pick it or drop it and continue with the next card.
We study algorithms for two natural objectives. In the first one, as in the secretary problem, the player wants to maximize the probability of selecting the maximum hidden value. We show that this can be done with probability at least $0.45292$. In the second one, similar to the prophet inequality, the player maximizes the expectation of the selected hidden value. We show a guarantee of at least $0.63518$ with respect to the expected maximum hidden value.
Our algorithms result from combining three basic strategies. One is to stop whenever we see a value larger than the initial $n$ visible numbers. The second one is to stop the first time the last flipped card's value is the largest of the currently $n$ visible numbers in the table. And the third one is similar to the latter but it additionally requires that the last flipped value is larger than the value on the other side of its card.
We apply our results to the prophet secretary problem with unknown distributions, but with access to a single sample from each distribution. Our guarantee improves upon $1-1/e$ for this problem, which is the currently best known guarantee and only works for the i.i.d. case.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
The minimum cost query problem on matroids with uncertainty areas
Authors:
Arturo I. Merino,
José A. Soto
Abstract:
We study the minimum weight basis problem on matroid when elements' weights are uncertain. For each element we only know a set of possible values (an uncertainty area) that contains its real weight. In some cases there exist bases that are uniformly optimal, that is, they are minimum weight bases for every possible weight function obeying the uncertainty areas. In other cases, computing such a bas…
▽ More
We study the minimum weight basis problem on matroid when elements' weights are uncertain. For each element we only know a set of possible values (an uncertainty area) that contains its real weight. In some cases there exist bases that are uniformly optimal, that is, they are minimum weight bases for every possible weight function obeying the uncertainty areas. In other cases, computing such a basis is not possible unless we perform some queries for the exact value of some elements.
Our main result is a polynomial time algorithm for the following problem. Given a matroid with uncertainty areas and a query cost function on its elements, find the set of elements of minimum total cost that we need to simultaneously query such that, no matter their revelation, the resulting instance admits a uniformly optimal base. We also provide combinatorial characterizations of all uniformly optimal bases, when one exists; and of all sets of queries that can be performed so that after revealing the corresponding weights the resulting instance admits a uniformly optimal base.
△ Less
Submitted 26 April, 2019;
originally announced April 2019.
-
Strong Algorithms for the Ordinal Matroid Secretary Problem
Authors:
José A. Soto,
Abner Turkieltaub,
Victor Verdugo
Abstract:
In the ordinal Matroid Secretary Problem (MSP), elements from a weighted matroid are presented in random order to an algorithm that must incrementally select a large weight independent set. However, the algorithm can only compare pairs of revealed elements without using its numerical value. An algorithm is $α$ probability-competitive if every element from the optimum appears with probability…
▽ More
In the ordinal Matroid Secretary Problem (MSP), elements from a weighted matroid are presented in random order to an algorithm that must incrementally select a large weight independent set. However, the algorithm can only compare pairs of revealed elements without using its numerical value. An algorithm is $α$ probability-competitive if every element from the optimum appears with probability $1/α$ in the output. We present a technique to design algorithms with strong probability-competitive ratios, improving the guarantees for almost every matroid class considered in the literature: e.g., we get ratios of 4 for graphic matroids (improving on $2e$ by Korula and Pál [ICALP 2009]) and of 5.19 for laminar matroids (improving on 9.6 by Ma et al. [THEOR COMPUT SYST 2016]). We also obtain new results for superclasses of $k$ column sparse matroids, for hypergraphic matroids, certain gammoids and graph packing matroids, and a $1+O(\sqrt{\log ρ/ρ})$ probability-competitive algorithm for uniform matroids of rank $ρ$ based on Kleinberg's $1+O(\sqrt{1/ρ})$ utility-competitive algorithm [SODA 2005] for that class. Our second contribution are algorithms for the ordinal MSP on arbitrary matroids of rank $ρ$. We devise an $O(\log ρ)$ probability-competitive algorithm and an $O(\log\log ρ)$ ordinal-competitive algorithm, a weaker notion of competitiveness but stronger than the utility variant. These are based on the $O(\log\log ρ)$ utility-competitive algorithm by Feldman et al.~[SODA 2015].
△ Less
Submitted 6 February, 2018;
originally announced February 2018.
-
Robust randomized matchings
Authors:
Jannik Matuschke,
Martin Skutella,
José A. Soto
Abstract:
The following game is played on a weighted graph: Alice selects a matching $M$ and Bob selects a number $k$. Alice's payoff is the ratio of the weight of the $k$ heaviest edges of $M$ to the maximum weight of a matching of size at most $k$. If $M$ guarantees a payoff of at least $α$ then it is called $α$-robust. In 2002, Hassin and Rubinstein gave an algorithm that returns a $1/\sqrt{2}$-robust ma…
▽ More
The following game is played on a weighted graph: Alice selects a matching $M$ and Bob selects a number $k$. Alice's payoff is the ratio of the weight of the $k$ heaviest edges of $M$ to the maximum weight of a matching of size at most $k$. If $M$ guarantees a payoff of at least $α$ then it is called $α$-robust. In 2002, Hassin and Rubinstein gave an algorithm that returns a $1/\sqrt{2}$-robust matching, which is best possible.
We show that Alice can improve her payoff to $1/\ln(4)$ by playing a randomized strategy. This result extends to a very general class of independence systems that includes matroid intersection, b-matchings, and strong 2-exchange systems. It also implies an improved approximation factor for a stochastic optimization variant known as the maximum priority matching problem and translates to an asymptotic robustness guarantee for deterministic matchings, in which Bob can only select numbers larger than a given constant. Moreover, we give a new LP-based proof of Hassin and Rubinstein's bound.
△ Less
Submitted 18 May, 2017;
originally announced May 2017.
-
A Survey of State Management in Big Data Processing Systems
Authors:
Quoc-Cuong To,
Juan Soto,
Volker Markl
Abstract:
State management and its use in diverse applications varies widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays in various use cases, in this survey, we present some of the most important uses of state as an enabler, dis…
▽ More
State management and its use in diverse applications varies widely across big data processing systems. This is evident in both the research literature and existing systems, such as Apache Flink, Apache Samza, Apache Spark, and Apache Storm. Given the pivotal role that state management plays in various use cases, in this survey, we present some of the most important uses of state as an enabler, discuss the alternative approaches used to handle and implement state, propose a taxonomy to capture the many facets of state management, and highlight new research directions. Our aim is to provide insight into disparate state management techniques, motivate others to pursue research in this area, and draw attention to some open problems.
△ Less
Submitted 1 August, 2018; v1 submitted 6 February, 2017;
originally announced February 2017.
-
Symmetry exploitation for Online Machine Covering with Bounded Migration
Authors:
Waldo Gálvez,
José A. Soto,
José Verschae
Abstract:
Online models that allow recourse are highly effective in situations where classical models are too pessimistic. One such problem is the online machine covering problem on identical machines. In this setting, jobs arrive one by one and must be assigned to machines with the objective of maximizing the minimum machine load. When a job arrives, we are allowed to reassign some jobs as long as their to…
▽ More
Online models that allow recourse are highly effective in situations where classical models are too pessimistic. One such problem is the online machine covering problem on identical machines. In this setting, jobs arrive one by one and must be assigned to machines with the objective of maximizing the minimum machine load. When a job arrives, we are allowed to reassign some jobs as long as their total size is (at most) proportional to the processing time of the arriving job. The proportionality constant is called the migration factor of the algorithm. Using a rounding procedure with useful structural properties for online packing and covering problems, we design first a simple $(1.7 + \varepsilon)$-competitive algorithm using a migration factor of $O(1/\varepsilon)$ which maintains at every arrival a locally optimal solution with respect to the Jump neighborhood. After that, we present as our main contribution a more involved $(4/3+\varepsilon)$-competitive algorithm using a migration factor of $\tilde{O}(1/\varepsilon^3)$. At every arrival, we run an adaptation of the Largest Processing Time first (LPT) algorithm. Since the new job can cause a complete change of the assignment of smaller jobs in both cases, a low migration factor is achieved by carefully exploiting the highly symmetric structure obtained by the rounding procedure.
△ Less
Submitted 28 August, 2018; v1 submitted 6 December, 2016;
originally announced December 2016.
-
Independent sets and hitting sets of bicolored rectangular families
Authors:
José A. Soto,
Claudio Telha
Abstract:
A bicolored rectangular family BRF is a collection of all axis-parallel rectangles contained in a given region Z of the plane formed by selecting a bottom-left corner from a set A and an upper-right corner from a set B. We prove that the maximum independent set and the minimum hitting set of a BRF have the same cardinality and devise polynomial time algorithms to compute both. As a direct conseque…
▽ More
A bicolored rectangular family BRF is a collection of all axis-parallel rectangles contained in a given region Z of the plane formed by selecting a bottom-left corner from a set A and an upper-right corner from a set B. We prove that the maximum independent set and the minimum hitting set of a BRF have the same cardinality and devise polynomial time algorithms to compute both. As a direct consequence, we obtain the first polynomial time algorithm to compute minimum biclique covers, maximum cross-free matchings and jump numbers in a class of bipartite graphs that significantly extends convex bipartite graphs and interval bigraphs. We also establish several connections between our work and other seemingly unrelated problems. Furthermore, when the bicolored rectangular family is weighted, we show that the problem of finding the maximum weight of an independent set is NP-hard, and provide efficient algorithms to solve it on certain subclasses.
△ Less
Submitted 9 November, 2014;
originally announced November 2014.
-
TSP Tours in Cubic Graphs: Beyond 4/3
Authors:
José R. Correa,
Omar Larré,
José A. Soto
Abstract:
After a sequence of improvements Boyd, Sitters, van der Ster, and Stougie proved that any 2-connected graph whose n vertices have degree 3, i.e., a cubic 2-connected graph, has a Hamiltonian tour of length at most (4/3)n, establishing in particular that the integrality gap of the subtour LP is at most 4/3 for cubic 2-connected graphs and matching the conjectured value of the famous 4/3 conjecture.…
▽ More
After a sequence of improvements Boyd, Sitters, van der Ster, and Stougie proved that any 2-connected graph whose n vertices have degree 3, i.e., a cubic 2-connected graph, has a Hamiltonian tour of length at most (4/3)n, establishing in particular that the integrality gap of the subtour LP is at most 4/3 for cubic 2-connected graphs and matching the conjectured value of the famous 4/3 conjecture. In this paper we improve upon this result by designing an algorithm that finds a tour of length (4/3 - 1/61236)n, implying that cubic 2-connected graphs are among the few interesting classes of graphs for which the integrality gap of the subtour LP is strictly less than 4/3. With the previous result, and by considering an even smaller epsilon, we show that the integrality gap of the TSP relaxation is at most 4/3 - epsilon, even if the graph is not 2-connected (i.e. for cubic connected graphs), implying that the approximability threshold of the TSP in cubic graphs is strictly below 4/3. Finally, using similar techniques we show, as an additional result, that every Barnette graph admits a tour of length at most (4/3 - 1/18)n.
△ Less
Submitted 7 October, 2013;
originally announced October 2013.
-
Independent and Hitting Sets of Rectangles Intersecting a Diagonal Line : Algorithms and Complexity
Authors:
José R. Correa,
Laurent Feuilloley,
Pablo Pérez-Lantero,
José A. Soto
Abstract:
Finding a maximum independent set (MIS) of a given fam- ily of axis-parallel rectangles is a basic problem in computational geom- etry and combinatorics. This problem has attracted significant atten- tion since the sixties, when Wegner conjectured that the corresponding duality gap, i.e., the maximum possible ratio between the maximum independent set and the minimum hitting set (MHS), is bounded b…
▽ More
Finding a maximum independent set (MIS) of a given fam- ily of axis-parallel rectangles is a basic problem in computational geom- etry and combinatorics. This problem has attracted significant atten- tion since the sixties, when Wegner conjectured that the corresponding duality gap, i.e., the maximum possible ratio between the maximum independent set and the minimum hitting set (MHS), is bounded by a universal constant. An interesting special case, that may prove use- ful to tackling the general problem, is the diagonal-intersecting case, in which the given family of rectangles is intersected by a diagonal. Indeed, Chepoi and Felsner recently gave a factor 6 approximation algorithm for MHS in this setting, and showed that the duality gap is between 3/2 and 6. In this paper we improve upon these results. First we show that MIS in diagonal-intersecting families is NP-complete, providing one smallest subclass for which MIS is provably hard. Then, we derive an $O(n^2)$-time algorithm for the maximum weight independent set when, in addition the rectangles intersect below the diagonal. This improves and extends a classic result of Lubiw, and amounts to obtain a 2-approximation algo- rithm for the maximum weight independent set of rectangles intersecting a diagonal. Finally, we prove that for diagonal-intersecting families the duality gap is between 2 and 4. The upper bound, which implies an approximation algorithm of the same factor, follows from a simple com- binatorial argument, while the lower bound represents the best known lower bound on the duality gap, even in the general case.
△ Less
Submitted 3 January, 2014; v1 submitted 25 September, 2013;
originally announced September 2013.
-
Advances on Matroid Secretary Problems: Free Order Model and Laminar Case
Authors:
Patrick Jaillet,
José A. Soto,
Rico Zenklusen
Abstract:
The most well-known conjecture in the context of matroid secretary problems claims the existence of a constant-factor approximation applicable to any matroid. Whereas this conjecture remains open, modified forms of it were shown to be true, when assuming that the assignment of weights to the secretaries is not adversarial but uniformly random (Soto [SODA 2011], Oveis Gharan and Vondrák [ESA 2011])…
▽ More
The most well-known conjecture in the context of matroid secretary problems claims the existence of a constant-factor approximation applicable to any matroid. Whereas this conjecture remains open, modified forms of it were shown to be true, when assuming that the assignment of weights to the secretaries is not adversarial but uniformly random (Soto [SODA 2011], Oveis Gharan and Vondrák [ESA 2011]). However, so far, there was no variant of the matroid secretary problem with adversarial weight assignment for which a constant-factor approximation was found. We address this point by presenting a 9-approximation for the \emph{free order model}, a model suggested shortly after the introduction of the matroid secretary problem, and for which no constant-factor approximation was known so far. The free order model is a relaxed version of the original matroid secretary problem, with the only difference that one can choose the order in which secretaries are interviewed.
Furthermore, we consider the classical matroid secretary problem for the special case of laminar matroids. Only recently, a constant-factor approximation has been found for this case, using a clever but rather involved method and analysis (Im and Wang, [SODA 2011]) that leads to a 16000/3-approximation. This is arguably the most involved special case of the matroid secretary problem for which a constant-factor approximation is known. We present a considerably simpler and stronger $3\sqrt{3}e\approx 14.12$-approximation, based on reducing the problem to a matroid secretary problem on a partition matroid.
△ Less
Submitted 23 June, 2014; v1 submitted 5 July, 2012;
originally announced July 2012.
-
Generalizations and Variants of the Largest Non-crossing Matching Problem in Random Bipartite Graphs
Authors:
Marcos Kiwi,
José A. Soto
Abstract:
We are interested in the statistics of the length of the longest increasing subsequence of 2-rowed lexicographically sorted arrays chosen according to distinct families of distributions D = (D_n)_n, and when n goes to infinity. This framework encompasses well studied problems such as the so called Longest Increasing Subsequence problem, the Longest Common Subsequence problem, problems concerning d…
▽ More
We are interested in the statistics of the length of the longest increasing subsequence of 2-rowed lexicographically sorted arrays chosen according to distinct families of distributions D = (D_n)_n, and when n goes to infinity. This framework encompasses well studied problems such as the so called Longest Increasing Subsequence problem, the Longest Common Subsequence problem, problems concerning directed bond percolation models, among others. We define several natural families of distinct distributions and characterize the asymptotic behavior of the expected length of a longest increasing subsequence chosen according to them. In particular, we consider generalizations to d-rowed arrays as well as symmetry restricted two-rowed arrays.
△ Less
Submitted 3 May, 2011;
originally announced May 2011.
-
A simple PTAS for Weighted Matroid Matching on Strongly Base Orderable Matroids
Authors:
José A. Soto
Abstract:
We give a simple polynomial time approximation scheme for the weighted matroid matching problem on strongly base orderable matroids. We also show that even the unweighted version of this problem is NP-complete and not in oracle-coNP.
We give a simple polynomial time approximation scheme for the weighted matroid matching problem on strongly base orderable matroids. We also show that even the unweighted version of this problem is NP-complete and not in oracle-coNP.
△ Less
Submitted 16 February, 2011;
originally announced February 2011.
-
Matroid Secretary Problem in the Random Assignment Model
Authors:
José A. Soto
Abstract:
In the Matroid Secretary Problem, introduced by Babaioff et al. [SODA 2007], the elements of a given matroid are presented to an online algorithm in random order. When an element is revealed, the algorithm learns its weight and decides whether or not to select it under the restriction that the selected elements form an independent set in the matroid. The objective is to maximize the total weight…
▽ More
In the Matroid Secretary Problem, introduced by Babaioff et al. [SODA 2007], the elements of a given matroid are presented to an online algorithm in random order. When an element is revealed, the algorithm learns its weight and decides whether or not to select it under the restriction that the selected elements form an independent set in the matroid. The objective is to maximize the total weight of the chosen elements. In the most studied version of this problem, the algorithm has no information about the weights beforehand. We refer to this as the zero information model. In this paper we study a different model, also proposed by Babaioff et al., in which the relative order of the weights is random in the matroid. To be precise, in the random assignment model, an adversary selects a collection of weights that are randomly assigned to the elements of the matroid. Later, the elements are revealed to the algorithm in a random order independent of the assignment.
Our main result is the first constant competitive algorithm for the matroid secretary problem in the random assignment model. This solves an open question of Babaioff et al. Our algorithm achieves a competitive ratio of $2e^2/(e-1)$. It exploits the notion of principal partition of a matroid, its decomposition into uniformly dense minors, and a $2e$-competitive algorithm for uniformly dense matroids we also develop. As additional results, we present simple constant competitive algorithms in the zero information model for various classes of matroids including cographic, low density and the case when every element is in a small cocircuit. In the same model, we also give a $ke$-competitive algorithm for $k$-column sparse linear matroids, and a new $O(\log r)$-competitive algorithm for general matroids of rank $r$ which only uses the relative order of the weights seen and not their numerical value, as previously needed.
△ Less
Submitted 13 July, 2010;
originally announced July 2010.
-
Symmetric Submodular Function Minimization Under Hereditary Family Constraints
Authors:
Michel X. Goemans,
José A. Soto
Abstract:
We present an efficient algorithm to find non-empty minimizers of a symmetric submodular function over any family of sets closed under inclusion. This for example includes families defined by a cardinality constraint, a knapsack constraint, a matroid independence constraint, or any combination of such constraints. Our algorithm make $O(n^3)$ oracle calls to the submodular function where $n$ is the…
▽ More
We present an efficient algorithm to find non-empty minimizers of a symmetric submodular function over any family of sets closed under inclusion. This for example includes families defined by a cardinality constraint, a knapsack constraint, a matroid independence constraint, or any combination of such constraints. Our algorithm make $O(n^3)$ oracle calls to the submodular function where $n$ is the cardinality of the ground set. In contrast, the problem of minimizing a general submodular function under a cardinality constraint is known to be inapproximable within $o(\sqrt{n/\log n})$ (Svitkina and Fleischer [2008]).
The algorithm is similar to an algorithm of Nagamochi and Ibaraki [1998] to find all nontrivial inclusionwise minimal minimizers of a symmetric submodular function over a set of cardinality $n$ using $O(n^3)$ oracle calls. Their procedure in turn is based on Queyranne's algorithm [1998] to minimize a symmetric submodular
△ Less
Submitted 13 July, 2010;
originally announced July 2010.
-
Improved Analysis of a Max Cut Algorithm Based on Spectral Partitioning
Authors:
José Soto
Abstract:
Trevisan [SICOMP 2012] presented an algorithm for Max-Cut based on spectral partitioning techniques. This is the first algorithm for Max-Cut with an approximation guarantee strictly larger than 1/2 that is not based on semidefinite programming. Trevisan showed that its approximation ratio is of at least 0.531. In this paper we improve this bound up to 0.614247. We also define and extend this resul…
▽ More
Trevisan [SICOMP 2012] presented an algorithm for Max-Cut based on spectral partitioning techniques. This is the first algorithm for Max-Cut with an approximation guarantee strictly larger than 1/2 that is not based on semidefinite programming. Trevisan showed that its approximation ratio is of at least 0.531. In this paper we improve this bound up to 0.614247. We also define and extend this result for the more general Maximum Colored Cut problem.
△ Less
Submitted 2 December, 2014; v1 submitted 5 October, 2009;
originally announced October 2009.