The Combinatorial Data Fusion Problem in Conflicted-supervised Learning

Darling, R. W. R.; Harris, David G.; Phulara, Dev R.; Proos, John A.

Abstract:The best merge problem in industrial data science generates instances where disparate data sources place incompatible relational structures on the same set $V$ of objects. Graph vertex labelling data may include (1) missing or erroneous labels,(2) assertions that two vertices carry the same (unspecified) label, and (3) denying some subset of vertices from carrying the same label. Conflicted-supervised learning applies to cases where no labelling scheme satisfies (1), (2), and (3). Our rigorous formulation starts from a connected weighted graph $(V, E)$, and an independence system $\mathcal{S}$ on $V$, characterized by its circuits, called forbidden sets. Global incompatibility is expressed by the fact $V \notin \mathcal{S}$. Combinatorial data fusion seeks a subset $E_1 \subset E$ of maximum edge weight so that no vertex component of the subgraph $(V, E_1)$ contains any forbidden set. Multicut and multiway cut are special cases where all forbidden sets have cardinality two. The general case exhibits unintuitive properties, shown in counterexamples. The first in a series of papers concentrates on cases where $(V, E)$ is a tree, and presents an algorithm on general graphs, in which the combinatorial data fusion problem is transferred to the Gomory-Hu tree, where it is solved using greedy set cover. Experimental results are given.

Comments:	48 pages, 10 figures
Subjects:	Combinatorics (math.CO); Optimization and Control (math.OC)
MSC classes:	05C85
Cite as:	arXiv:1809.08723 [math.CO]
	(or arXiv:1809.08723v1 [math.CO] for this version)
	https://doi.org/10.48550/arXiv.1809.08723

Mathematics > Combinatorics

Title:The Combinatorial Data Fusion Problem in Conflicted-supervised Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators