-
Approximating Submodular Matroid-Constrained Partitioning
Authors:
Kristóf Bérczi,
Karthekeyan Chandrasekaran,
Tamás Király,
Daniel P. Szabo
Abstract:
The submodular partitioning problem asks to minimize, over all partitions P of a ground set V, the sum of a given submodular function f over the parts of P. The problem has seen considerable work in approximability, as it encompasses multiterminal cuts on graphs, k-cuts on hypergraphs, and elementary linear algebra problems such as matrix multiway partitioning. This research has been divided betwe…
▽ More
The submodular partitioning problem asks to minimize, over all partitions P of a ground set V, the sum of a given submodular function f over the parts of P. The problem has seen considerable work in approximability, as it encompasses multiterminal cuts on graphs, k-cuts on hypergraphs, and elementary linear algebra problems such as matrix multiway partitioning. This research has been divided between the fixed terminal setting, where we are given a set of terminals that must be separated by P, and the global setting, where the only constraint is the size of the partition. We investigate a generalization that unifies these two settings: minimum submodular matroid-constrained partition. In this problem, we are additionally given a matroid over the ground set and seek to find a partition P in which there exists some basis that is separated by P. We explore the approximability of this problem and its variants, reaching the state of the art for the special case of symmetric submodular functions, and provide results for monotone and general submodular functions as well.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
From Voice to Value: Leveraging AI to Enhance Spoken Online Reviews on the Go
Authors:
Kavindu Ravishan,
Dániel Szabó,
Niels van Berkel,
Aku Visuri,
Chi-Lan Yang,
Koji Yatani,
Simo Hosio
Abstract:
Online reviews help people make better decisions. Review platforms usually depend on typed input, where leaving a good review requires significant effort because users must carefully organize and articulate their thoughts. This may discourage users from leaving comprehensive and high-quality reviews, especially when they are on the go. To address this challenge, we developed Vocalizer, a mobile ap…
▽ More
Online reviews help people make better decisions. Review platforms usually depend on typed input, where leaving a good review requires significant effort because users must carefully organize and articulate their thoughts. This may discourage users from leaving comprehensive and high-quality reviews, especially when they are on the go. To address this challenge, we developed Vocalizer, a mobile application that enables users to provide reviews through voice input, with enhancements from a large language model (LLM). In a longitudinal study, we analysed user interactions with the app, focusing on AI-driven features that help refine and improve reviews. Our findings show that users frequently utilized the AI agent to add more detailed information to their reviews. We also show how interactive AI features can improve users self-efficacy and willingness to share reviews online. Finally, we discuss the opportunities and challenges of integrating AI assistance into review-writing systems.
△ Less
Submitted 10 December, 2024; v1 submitted 6 December, 2024;
originally announced December 2024.
-
Quantum property testing in sparse directed graphs
Authors:
Simon Apers,
Frédéric Magniez,
Sayantan Sen,
Dániel Szabó
Abstract:
We initiate the study of quantum property testing in sparse directed graphs, and more particularly in the unidirectional model, where the algorithm is allowed to query only the outgoing edges of a vertex.
In the classical unidirectional model the problem of testing $k$-star-freeness, and more generally $k$-source-subgraph-freeness, is almost maximally hard for large $k$. We prove that this probl…
▽ More
We initiate the study of quantum property testing in sparse directed graphs, and more particularly in the unidirectional model, where the algorithm is allowed to query only the outgoing edges of a vertex.
In the classical unidirectional model the problem of testing $k$-star-freeness, and more generally $k$-source-subgraph-freeness, is almost maximally hard for large $k$. We prove that this problem has almost quadratic advantage in the quantum setting. Moreover, we prove that this advantage is nearly tight, by showing a quantum lower bound using the method of dual polynomials on an intermediate problem for a new, property testing version of the $k$-collision problem that was not studied before.
To illustrate that not all problems in graph property testing admit such a quantum speedup, we consider the problem of $3$-colorability in the related undirected bounded-degree model, when graphs are now undirected. This problem is maximally hard to test classically, and we show that also quantumly it requires a linear number of queries.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Multiway Cuts with a Choice of Representatives
Authors:
Kristóf Bérczi,
Tamás Király,
Daniel P. Szabo
Abstract:
In this paper, we study several generalizations of multiway cut where the terminals can be chosen as \emph{representatives} from sets of \emph{candidates} $T_1,\ldots,T_q$. In this setting, one is allowed to choose these representatives so that the minimum-weight cut separating these sets \emph{via their representatives} is as small as possible. We distinguish different cases depending on (A) whet…
▽ More
In this paper, we study several generalizations of multiway cut where the terminals can be chosen as \emph{representatives} from sets of \emph{candidates} $T_1,\ldots,T_q$. In this setting, one is allowed to choose these representatives so that the minimum-weight cut separating these sets \emph{via their representatives} is as small as possible. We distinguish different cases depending on (A) whether the representative of a candidate set has to be separated from the other candidate sets completely or only from the representatives, and (B) whether there is a single representative for each candidate set or the choice of representative is independent for each pair of candidate sets. For fixed $q$, we give approximation algorithms for each of these problems that match the best known approximation guarantee for multiway cut. Our technical contribution is a new extension of the CKR relaxation that preserves approximation guarantees. For general $q$, we show $o(\log q)$-inapproximability for all cases where the choice of representatives may depend on the pair of candidate sets, as well as for the case where the goal is to separate a fixed node from a single representative from each candidate set. As a positive result, we give a $2$-approximation algorithm for the case where we need to choose a single representative from each candidate set. This is a generalization of the $(2-2/k)$-approximation for k-cut, and we can solve it by relating the tree case to optimization over a gammoid.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
A Uniformly Random Solution to Algorithmic Redistricting
Authors:
Jin-Yi Cai,
Jacob Kruse,
Kenneth Mayer,
Daniel P. Szabo
Abstract:
The process of drawing electoral district boundaries is known as political redistricting. Within this context, gerrymandering is the practice of drawing these boundaries such that they unfairly favor a particular political party, often leading to unequal representation and skewed electoral outcomes. One of the few ways to detect gerrymandering is by algorithmically sampling redistricting plans. Pr…
▽ More
The process of drawing electoral district boundaries is known as political redistricting. Within this context, gerrymandering is the practice of drawing these boundaries such that they unfairly favor a particular political party, often leading to unequal representation and skewed electoral outcomes. One of the few ways to detect gerrymandering is by algorithmically sampling redistricting plans. Previous methods mainly focus on sampling from some neighborhood of ``realistic' districting plans, rather than a uniform sample of the entire space. We present a deterministic subexponential time algorithm to uniformly sample from the space of all possible $ k $-partitions of a bounded degree planar graph, and with this construct a sample of the entire space of redistricting plans. We also give a way to restrict this sample space to plans that match certain compactness and population constraints at the cost of added complexity. The algorithm runs in $ 2^{O(\sqrt{n}\log n)} $ time, although we only give a heuristic implementation. Our method generalizes an algorithm to count self-avoiding walks on a square to count paths that split general planar graphs into $ k $ regions, and uses this to sample from the space of all $ k $-partitions of a planar graph.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Holey graphs: very large Betti numbers are testable
Authors:
Dániel Szabó,
Simon Apers
Abstract:
We show that the graph property of having a (very) large $k$-th Betti number $β_k$ for constant $k$ is testable with a constant number of queries in the dense graph model. More specifically, we consider a clique complex defined by an underlying graph and prove that for any $\varepsilon>0$, there exists $δ(\varepsilon,k)>0$ such that testing whether $β_k \geq (1-δ) d_k$ for…
▽ More
We show that the graph property of having a (very) large $k$-th Betti number $β_k$ for constant $k$ is testable with a constant number of queries in the dense graph model. More specifically, we consider a clique complex defined by an underlying graph and prove that for any $\varepsilon>0$, there exists $δ(\varepsilon,k)>0$ such that testing whether $β_k \geq (1-δ) d_k$ for $δ\leq δ(\varepsilon,k)$ reduces to tolerantly testing $(k+2)$-clique-freeness, which is known to be testable. This complements a result by Elek (2010) showing that Betti numbers are testable in the bounded-degree model. Our result combines the Euler characteristic, matroid theory and the graph removal lemma.
△ Less
Submitted 18 February, 2025; v1 submitted 11 January, 2024;
originally announced January 2024.
-
Bringing Spatial Interaction Measures into Multi-Criteria Assessment of Redistricting Plans Using Interactive Web Mapping
Authors:
Jacob Kruse,
Song Gao,
Yuhan Ji,
Daniel P. Szabo,
Kenneth Mayer
Abstract:
Redistricting is the process by which electoral district boundaries are drawn, and a common normative assumption in this process is that districts should be drawn so as to capture coherent communities of interest (COIs). While states rely on various proxies for community illustration, such as compactness metrics and municipal split counts, to guide redistricting, recent legal challenges and schola…
▽ More
Redistricting is the process by which electoral district boundaries are drawn, and a common normative assumption in this process is that districts should be drawn so as to capture coherent communities of interest (COIs). While states rely on various proxies for community illustration, such as compactness metrics and municipal split counts, to guide redistricting, recent legal challenges and scholarly works have shown the failings of such proxy measures and the difficulty of balancing multiple criteria in district plan creation. To address these issues, we propose the use of spatial interaction communities to directly quantify the degree to which districts capture the underlying COIs. Using large-scale human mobility flow data, we condense spatial interaction community capture for a set of districts into a single number, the interaction ratio (IR), which can be used for redistricting plan evaluation. To compare the IR to traditional redistricting criteria (compactness and fairness), and to explore the range of IR values found in valid districting plans, we employ a Markov chain-based regionalization algorithm (ReCom) to produce ensembles of valid plans, and calculate the degree to which they capture spatial interaction communities. Furthermore, we propose two methods for biasing the ReCom algorithm towards different IR values. We perform a multi-criteria assessment of the space of valid maps, and present the results in an interactive web map. The experiments on Wisconsin congressional districting plans demonstrate the effectiveness of our methods for biasing sampling towards higher or lower IR values. Furthermore, the analysis of the districts produced with these methods suggests that districts with higher IR and compactness values tend to produce district plans that are more proportional with regards to seats allocated to each of the two major parties.
△ Less
Submitted 23 September, 2023;
originally announced September 2023.
-
A (simple) classical algorithm for estimating Betti numbers
Authors:
Simon Apers,
Sander Gribling,
Sayantan Sen,
Dániel Szabó
Abstract:
We describe a simple algorithm for estimating the $k$-th normalized Betti number of a simplicial complex over $n$ elements using the path integral Monte Carlo method. For a general simplicial complex, the running time of our algorithm is $n^{O\left(\frac{1}{\sqrtγ}\log\frac{1}{\varepsilon}\right)}$ with $γ$ measuring the spectral gap of the combinatorial Laplacian and $\varepsilon \in (0,1)$ the a…
▽ More
We describe a simple algorithm for estimating the $k$-th normalized Betti number of a simplicial complex over $n$ elements using the path integral Monte Carlo method. For a general simplicial complex, the running time of our algorithm is $n^{O\left(\frac{1}{\sqrtγ}\log\frac{1}{\varepsilon}\right)}$ with $γ$ measuring the spectral gap of the combinatorial Laplacian and $\varepsilon \in (0,1)$ the additive precision. In the case of a clique complex, the running time of our algorithm improves to $\left(n/λ_{\max}\right)^{O\left(\frac{1}{\sqrtγ}\log\frac{1}{\varepsilon}\right)}$ with $λ_{\max} \geq k$, where $λ_{\max}$ is the maximum eigenvalue of the combinatorial Laplacian. Our algorithm provides a classical benchmark for a line of quantum algorithms for estimating Betti numbers. On clique complexes it matches their running time when, for example, $γ\in Ω(1)$ and $k \in Ω(n)$.
△ Less
Submitted 5 December, 2023; v1 submitted 17 November, 2022;
originally announced November 2022.
-
End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment Analysis
Authors:
Gerhard Johann Hagerer,
David Szabo,
Andreas Koch,
Maria Luisa Ripoll Dominguez,
Christian Widmer,
Maximilian Wich,
Hannah Danner,
Georg Groh
Abstract:
Sentiment analysis is often a crowdsourcing task prone to subjective labels given by many annotators. It is not yet fully understood how the annotation bias of each annotator can be modeled correctly with state-of-the-art methods. However, resolving annotator bias precisely and reliably is the key to understand annotators' labeling behavior and to successfully resolve corresponding individual misc…
▽ More
Sentiment analysis is often a crowdsourcing task prone to subjective labels given by many annotators. It is not yet fully understood how the annotation bias of each annotator can be modeled correctly with state-of-the-art methods. However, resolving annotator bias precisely and reliably is the key to understand annotators' labeling behavior and to successfully resolve corresponding individual misconceptions and wrongdoings regarding the annotation task. Our contribution is an explanation and improvement for precise neural end-to-end bias modeling and ground truth estimation, which reduces an undesired mismatch in that regard of the existing state-of-the-art. Classification experiments show that it has potential to improve accuracy in cases where each sample is annotated only by one single annotator. We provide the whole source code publicly and release an own domain-specific sentiment dataset containing 10,000 sentences discussing organic food products. These are crawled from social media and are singly labeled by 10 non-expert annotators.
△ Less
Submitted 24 July, 2023; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Quantum Inspired Adaptive Boosting
Authors:
Bálint Daróczy,
Katalin Friedl,
László Kabódi,
Attila Pereszlényi,
Dániel Szabó
Abstract:
Building on the quantum ensemble based classifier algorithm of Schuld and Petruccione [arXiv:1704.02146v1], we devise equivalent classical algorithms which show that this quantum ensemble method does not have advantage over classical algorithms. Essentially, we simplify their algorithm until it is intuitive to come up with an equivalent classical version. One of the classical algorithms is extreme…
▽ More
Building on the quantum ensemble based classifier algorithm of Schuld and Petruccione [arXiv:1704.02146v1], we devise equivalent classical algorithms which show that this quantum ensemble method does not have advantage over classical algorithms. Essentially, we simplify their algorithm until it is intuitive to come up with an equivalent classical version. One of the classical algorithms is extremely simple and runs in constant time for each input to be classified. We further develop the idea and, as the main contribution of the paper, we propose methods inspired by combining the quantum ensemble method with adaptive boosting. The algorithms were tested and found to be comparable to the AdaBoost algorithm on publicly available data sets.
△ Less
Submitted 1 February, 2021;
originally announced February 2021.
-
Pristine annotations-based multi-modal trained artificial intelligence solution to triage chest X-ray for COVID-19
Authors:
Tao Tan,
Bipul Das,
Ravi Soni,
Mate Fejes,
Sohan Ranjan,
Daniel Attila Szabo,
Vikram Melapudi,
K S Shriram,
Utkarsh Agrawal,
Laszlo Rusko,
Zita Herczeg,
Barbara Darazs,
Pal Tegzes,
Lehel Ferenczi,
Rakesh Mullick,
Gopal Avinash
Abstract:
The COVID-19 pandemic continues to spread and impact the well-being of the global population. The front-line modalities including computed tomography (CT) and X-ray play an important role for triaging COVID patients. Considering the limited access of resources (both hardware and trained personnel) and decontamination considerations, CT may not be ideal for triaging suspected subjects. Artificial i…
▽ More
The COVID-19 pandemic continues to spread and impact the well-being of the global population. The front-line modalities including computed tomography (CT) and X-ray play an important role for triaging COVID patients. Considering the limited access of resources (both hardware and trained personnel) and decontamination considerations, CT may not be ideal for triaging suspected subjects. Artificial intelligence (AI) assisted X-ray based applications for triaging and monitoring require experienced radiologists to identify COVID patients in a timely manner and to further delineate the disease region boundary are seen as a promising solution. Our proposed solution differs from existing solutions by industry and academic communities, and demonstrates a functional AI model to triage by inferencing using a single x-ray image, while the deep-learning model is trained using both X-ray and CT data. We report on how such a multi-modal training improves the solution compared to X-ray only training. The multi-modal solution increases the AUC (area under the receiver operating characteristic curve) from 0.89 to 0.93 and also positively impacts the Dice coefficient (0.59 to 0.62) for localizing the pathology. To the best our knowledge, it is the first X-ray solution by leveraging multi-modal information for the development.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
An Automated Approach for the Discovery of Interoperability
Authors:
Duygu Sap,
Daniel P. Szabo
Abstract:
In this article, we present an automated approach that would test for and discover the interoperability of CAD systems based on the approximately-invariant shape properties of their models. We further show that exchanging models in standard format does not guarantee the preservation of shape properties. Our analysis is based on utilizing queries in deriving the shape properties and constructing th…
▽ More
In this article, we present an automated approach that would test for and discover the interoperability of CAD systems based on the approximately-invariant shape properties of their models. We further show that exchanging models in standard format does not guarantee the preservation of shape properties. Our analysis is based on utilizing queries in deriving the shape properties and constructing the proxy models of the given CAD models [1]. We generate template files to accommodate the information necessary for the property computations and proxy model constructions, and implement an interoperability discovery program called DTest to execute the interoperability testing. We posit that our method could be extended to interoperability testing on CAD-to-CAE and/or CAD-to-CAM interactions by modifying the set of property checks and providing the additional requirements that may emerge in CAE or CAM applications.
△ Less
Submitted 26 January, 2020;
originally announced January 2020.
-
Network Coding as a Service
Authors:
Dávid Szabó,
Attila Csoma,
Péter Megyesi,
András Gulyás,
Frank H. P. Fitzek
Abstract:
Network Coding (NC) shows great potential in various communication scenarios through changing the packet forwarding principles of current networks. It can improve not only throughput, latency, reliability and security but also alleviates the need of coordination in many cases. However, it is still controversial due to widespread misunderstandings on how to exploit the advantages of it. The aim of…
▽ More
Network Coding (NC) shows great potential in various communication scenarios through changing the packet forwarding principles of current networks. It can improve not only throughput, latency, reliability and security but also alleviates the need of coordination in many cases. However, it is still controversial due to widespread misunderstandings on how to exploit the advantages of it. The aim of the paper is to facilitate the usage of NC by $(i)$ explaining how it can improve the performance of the network (regardless the existence of any butterfly in the network), $(ii)$ showing how Software Defined Networking (SDN) can resolve the crucial problems of deployment and orchestration of NC elements, and $(iii)$ providing a prototype architecture with measurement results on the performance of our network coding capable software router implementation compared by fountain codes.
△ Less
Submitted 13 January, 2016;
originally announced January 2016.
-
Deductive Way of Reasoning about the Internet AS Level Topology
Authors:
Dávid Szabó,
Attila Kőrösi,
József Bíró,
András Gulyás
Abstract:
Our current understanding about the AS level topology of the Internet is based on measurements and inductive-type models which set up rules describing the behavior (node and edge dynamics) of the individual ASes and generalize the consequences of these individual actions for the complete AS ecosystem using induction. In this paper we suggest a third, deductive approach in which we have premises fo…
▽ More
Our current understanding about the AS level topology of the Internet is based on measurements and inductive-type models which set up rules describing the behavior (node and edge dynamics) of the individual ASes and generalize the consequences of these individual actions for the complete AS ecosystem using induction. In this paper we suggest a third, deductive approach in which we have premises for the whole AS system and the consequences of these premises are determined through deductive reasoning. We show that such a deductive approach can give complementary insights into the topological properties of the AS graph. While inductive models can mostly reflect high level statistics (e.g. degree distribution, clustering, diameter), deductive reasoning can identify omnipresent subgraphs and peering likelihood. We also propose a model, called YEAS, incorporating our deductive analytical findings that produces topologies contain both traditional and novel metrics for the AS level Internet.
△ Less
Submitted 10 December, 2015;
originally announced December 2015.