-
The Combinatorial Data Fusion Problem in Conflicted-supervised Learning
Authors:
R. W. R. Darling,
David G. Harris,
Dev R. Phulara,
John A. Proos
Abstract:
The best merge problem in industrial data science generates instances where disparate data sources place incompatible relational structures on the same set $V$ of objects. Graph vertex labelling data may include (1) missing or erroneous labels,(2) assertions that two vertices carry the same (unspecified) label, and (3) denying some subset of vertices from carrying the same label. Conflicted-superv…
▽ More
The best merge problem in industrial data science generates instances where disparate data sources place incompatible relational structures on the same set $V$ of objects. Graph vertex labelling data may include (1) missing or erroneous labels,(2) assertions that two vertices carry the same (unspecified) label, and (3) denying some subset of vertices from carrying the same label. Conflicted-supervised learning applies to cases where no labelling scheme satisfies (1), (2), and (3). Our rigorous formulation starts from a connected weighted graph $(V, E)$, and an independence system $\mathcal{S}$ on $V$, characterized by its circuits, called forbidden sets. Global incompatibility is expressed by the fact $V \notin \mathcal{S}$. Combinatorial data fusion seeks a subset $E_1 \subset E$ of maximum edge weight so that no vertex component of the subgraph $(V, E_1)$ contains any forbidden set. Multicut and multiway cut are special cases where all forbidden sets have cardinality two. The general case exhibits unintuitive properties, shown in counterexamples. The first in a series of papers concentrates on cases where $(V, E)$ is a tree, and presents an algorithm on general graphs, in which the combinatorial data fusion problem is transferred to the Gomory-Hu tree, where it is solved using greedy set cover. Experimental results are given.
△ Less
Submitted 23 September, 2018;
originally announced September 2018.
-
Subhypergraphs in non-uniform random hypergraphs
Authors:
Megan Dewar,
John Healy,
Xavier Pérez-Giménez,
Paweł Prałat,
John Proos,
Benjamin Reiniger,
Kirill Ternovsky
Abstract:
In this paper we focus on the problem of finding (small) subhypergraphs in a (large) hypergraph. We use this problem to illustrate that reducing hypergraph problems to graph problems by working with the 2-section is not always a reasonable approach. We begin by defining a generalization of the binomial random graph model to hypergraphs and formalizing several definitions of subhypergraph. The bulk…
▽ More
In this paper we focus on the problem of finding (small) subhypergraphs in a (large) hypergraph. We use this problem to illustrate that reducing hypergraph problems to graph problems by working with the 2-section is not always a reasonable approach. We begin by defining a generalization of the binomial random graph model to hypergraphs and formalizing several definitions of subhypergraph. The bulk of the paper focusses on determining the expected existence of these types of subhypergraph in random hypergraphs. We also touch on the problem of determining whether a given subgraph appearing in the 2-section is likely to have been induced by a certain subhypergraph in the hypergraph. To evaluate the model in relation to real-world data, we compare model prediction to two datasets with respect to (1) the existence of certain small subhypergraphs, and (2) a clustering coefficient.
△ Less
Submitted 13 March, 2018; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Connectivity in Hypergraphs
Authors:
Megan Dewar,
David Pike,
John Proos
Abstract:
In this paper we consider two natural notions of connectivity for hypergraphs: weak and strong. We prove that the strong vertex connectivity of a connected hypergraph is bounded by its weak edge connectivity, thereby extending a theorem of Whitney from graphs to hypergraphs. We find that while determining a minimum weak vertex cut can be done in polynomial time and is equivalent to finding a minim…
▽ More
In this paper we consider two natural notions of connectivity for hypergraphs: weak and strong. We prove that the strong vertex connectivity of a connected hypergraph is bounded by its weak edge connectivity, thereby extending a theorem of Whitney from graphs to hypergraphs. We find that while determining a minimum weak vertex cut can be done in polynomial time and is equivalent to finding a minimum vertex cut in the 2-section of the hypergraph in question, determining a minimum strong vertex cut is NP-hard for general hypergraphs. Moreover, the problem of finding minimum strong vertex cuts remains NP-hard when restricted to hypergraphs with maximum edge size at most 3. We also discuss the relationship between strong vertex connectivity and the minimum transversal problem for hypergraphs, showing that there are classes of hypergraphs for which one of the problems is NP-hard while the other can be solved in polynomial time.
△ Less
Submitted 2 February, 2018; v1 submitted 21 November, 2016;
originally announced November 2016.
-
Shor's discrete logarithm quantum algorithm for elliptic curves
Authors:
John Proos,
Christof Zalka
Abstract:
We show in some detail how to implement Shor's efficient quantum algorithm for discrete logarithms for the particular case of elliptic curve groups. It turns out that for this problem a smaller quantum computer can solve problems further beyond current computing than for integer factorisation. A 160 bit elliptic curve cryptographic key could be broken on a quantum computer using around 1000 qubi…
▽ More
We show in some detail how to implement Shor's efficient quantum algorithm for discrete logarithms for the particular case of elliptic curve groups. It turns out that for this problem a smaller quantum computer can solve problems further beyond current computing than for integer factorisation. A 160 bit elliptic curve cryptographic key could be broken on a quantum computer using around 1000 qubits while factoring the security-wise equivalent 1024 bit RSA modulus would require about 2000 qubits. In this paper we only consider elliptic curves over GF($p$) and not yet the equally important ones over GF($2^n$) or other finite fields. The main technical difficulty is to implement Euclid's gcd algorithm to compute multiplicative inverses modulo $p$. As the runtime of Euclid's algorithm depends on the input, one difficulty encountered is the ``quantum halting problem''.
△ Less
Submitted 22 January, 2004; v1 submitted 25 January, 2003;
originally announced January 2003.