Search | arXiv e-print repository

Formal Models of Active Learning from Contrastive Examples

Authors: Farnam Mansouri, Hans U. Simon, Adish Singla, Yuxin Chen, Sandra Zilles

Abstract: Machine learning can greatly benefit from providing learning algorithms with pairs of contrastive training examples -- typically pairs of instances that differ only slightly, yet have different class labels. Intuitively, the difference in the instances helps explain the difference in the class labels. This paper proposes a theoretical framework in which the effect of various types of contrastive e… ▽ More Machine learning can greatly benefit from providing learning algorithms with pairs of contrastive training examples -- typically pairs of instances that differ only slightly, yet have different class labels. Intuitively, the difference in the instances helps explain the difference in the class labels. This paper proposes a theoretical framework in which the effect of various types of contrastive examples on active learners is studied formally. The focus is on the sample complexity of learning concept classes and how it is influenced by the choice of contrastive examples. We illustrate our results with geometric concept classes and classes of Boolean functions. Interestingly, we reveal a connection between learning from contrastive examples and the classical model of self-directed learning. △ Less

Submitted 18 June, 2025; originally announced June 2025.

arXiv:2506.13655 [pdf, ps, other]

The Word Problem for Products of Symmetric Groups

Authors: Hans U. Simon

Abstract: The word problem for products of symmetric groups (WPPSG) is a well-known NP-complete problem. An input instance of this problem consists of ``specification sets'' $X_1,\ldots,X_m \seq \{1,\ldots,n\}$ and a permutation $τ$ on $\{1,\ldots,n\}$. The sets $X_1,\ldots,X_m$ specify a subset of the symmetric group $\cS_n$ and the question is whether the given permutation $τ$ is a member of this subset.… ▽ More The word problem for products of symmetric groups (WPPSG) is a well-known NP-complete problem. An input instance of this problem consists of ``specification sets'' $X_1,\ldots,X_m \seq \{1,\ldots,n\}$ and a permutation $τ$ on $\{1,\ldots,n\}$. The sets $X_1,\ldots,X_m$ specify a subset of the symmetric group $\cS_n$ and the question is whether the given permutation $τ$ is a member of this subset. We discuss three subproblems of WPPSG and show that they can be solved efficiently. The subproblem WPPSG$_0$ is the restriction of WPPSG to specification sets all of which are sets of consecutive integers. The subproblem WPPSG$_1$ is the restriction of WPPSG to specification sets which have the Consecutive Ones Property. The subproblem WPPSG$_2$ is the restriction of WPPSG to specification sets which have what we call the Weak Consecutive Ones Property. WPPSG$_1$ is more general than WPPSG$_0$ and WPPSG$_2$ is more general than WPPSG$_1$. But the efficient algorithms that we use for solving WPPSG$_1$ and WPPSG$_2$ have, as a sub-routine, the efficient algorithm for solving WPPSG$_0$. △ Less

Submitted 16 June, 2025; originally announced June 2025.

Comments: 24 pages, 3 figures

MSC Class: 68Q25 ACM Class: F.2.2

arXiv:2503.14061 [pdf, ps, other]

The Hierarchy of Saturating Matching Numbers

Authors: Hans U. Simon, Jan Arne Telle

Abstract: In this paper, we study three matching problems all of which came up quite recently in the field of machine teaching. The cost of a matching is defined in such a way that, for some formal model of teaching, it equals (or bounds) the number of labeled examples needed to solve a given teaching task. We show how the cost parameters associated with these problems depend on each other and how they are… ▽ More In this paper, we study three matching problems all of which came up quite recently in the field of machine teaching. The cost of a matching is defined in such a way that, for some formal model of teaching, it equals (or bounds) the number of labeled examples needed to solve a given teaching task. We show how the cost parameters associated with these problems depend on each other and how they are related to other well known combinatorial parameters (like, for instance, the VC-dimension). △ Less

Submitted 18 March, 2025; originally announced March 2025.

Comments: 22 pages, 1 figure, 2 tables

MSC Class: 68R05 ACM Class: G.2.1

arXiv:2502.09453 [pdf, ps, other]

RTD-Conjecture and Concept Classes Induced by Graphs

Authors: Hans U. Simon

Abstract: It is conjectured that the recursive teaching dimension of any finite concept class is upper-bounded by the VC-dimension of this class times a universal constant. In this paper, we confirm this conjecture for two rich families of concept classes where each class is induced by some graph $G$. For each $G$, we consider the class whose concepts represent stars in $G$ as well as the class whose concep… ▽ More It is conjectured that the recursive teaching dimension of any finite concept class is upper-bounded by the VC-dimension of this class times a universal constant. In this paper, we confirm this conjecture for two rich families of concept classes where each class is induced by some graph $G$. For each $G$, we consider the class whose concepts represent stars in $G$ as well as the class whose concepts represent connected sets in $G$. We show that, for concept classes of this kind, the recursive teaching dimension either equals the VC-dimension or is less by $1$. △ Less

Submitted 13 February, 2025; originally announced February 2025.

Comments: 19 pages, 2 figures

MSC Class: 68R05 (primary) 05C99 (secondary) ACM Class: F.1.3; G.2.1; I.2.6

arXiv:2405.19066 [pdf, ps, other]

A Note on the Subcubes of the $n$-Cube

Authors: Hans Ulrich Simon

Abstract: In the year 1990, Béla Bollobás, Imre Leader and Andrew Radcliffe considered the following combinatorial problem: given three parameters k, n and q, find a set of k vertices in the binary n-cube which contains a maximal number of q-dimensional subcubes. It was shown that an optimal solution is given by the k vertices which coincide with the binary representations of the number 0 , 1 , ... , k-1. T… ▽ More In the year 1990, Béla Bollobás, Imre Leader and Andrew Radcliffe considered the following combinatorial problem: given three parameters k, n and q, find a set of k vertices in the binary n-cube which contains a maximal number of q-dimensional subcubes. It was shown that an optimal solution is given by the k vertices which coincide with the binary representations of the number 0 , 1 , ... , k-1. Two proofs were presented. The proof given by Bollobas and Leader is particularly elegant and short. Here we show that also the other proof, the one given by Bollobas and Radcliffe, becomes quite simple and short when it is combined with a lemma from Graham whose publication dates back to 1970. As a second application of Graham's lemma, we solve a recursive equation (related to the optimization problem that we discussed before) that might be considered interesting in its own right. △ Less

Submitted 4 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: 7 pages, no figures

MSC Class: 68R05 ACM Class: G.2.1

arXiv:2402.06729 [pdf, ps, other]

Greedy Matchings in Bipartite Graphs with Ordered Vertex Sets

Authors: Hans U. Simon

Abstract: We define and study greedy matchings in vertex-ordered bipartite graphs. It is shown that each vertex-ordered bipartite graph has a unique greedy matching. The proof uses (a weak form of) Newman's lemma. The vertex ordering is called a preference relation. Given a vertex-ordered bipartite graph, the goal is to match every vertex of one vertex class but to leave unmatched as many as possible vertic… ▽ More We define and study greedy matchings in vertex-ordered bipartite graphs. It is shown that each vertex-ordered bipartite graph has a unique greedy matching. The proof uses (a weak form of) Newman's lemma. The vertex ordering is called a preference relation. Given a vertex-ordered bipartite graph, the goal is to match every vertex of one vertex class but to leave unmatched as many as possible vertices of low preference in the other concept class. We investigate how well greedy algorithms perform in this setting. It is shown that they have optimal performance provided that the vertex-ordering is cleverly chosen. The study of greedy matchings is motivated by problems in learning theory like illustrating or teaching concepts by means of labeled examples. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: 10 pages, no figures

MSC Class: 68R05 (primary) 68Q32 (secondary)

arXiv:2307.05252 [pdf, other]

MAP- and MLE-Based Teaching

Authors: Hans Ulrich Simon, Jan Arne Telle

Abstract: Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thi… ▽ More Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2205.08357 [pdf, ps, other]

Minimum Tournaments with the Strong $S_k$-Property and Implications for Teaching

Authors: Hans Ulrich Simon

Abstract: A tournament is said to have the $S_k$-property if, for any set of $k$ players, there is another player who beats them all. Minimum tournaments having this property have been explored very well in the 1960's and the early 1970's. In this paper, we define a strengthening of the $S_k$-property that we name "strong $S_k$-property". We show, first, that several basic results on the weaker notion remai… ▽ More A tournament is said to have the $S_k$-property if, for any set of $k$ players, there is another player who beats them all. Minimum tournaments having this property have been explored very well in the 1960's and the early 1970's. In this paper, we define a strengthening of the $S_k$-property that we name "strong $S_k$-property". We show, first, that several basic results on the weaker notion remain valid for the stronger notion (and the corresponding modification of the proofs requires only little extra-effort). Second, it is demonstrated that the stronger notion has applications in the area of Teaching. Specifically, we present an infinite family of concept classes all of which can be taught with a single example in the No-Clash model of teaching while, in order to teach a class $\cC$ of this family in the recursive model of teaching, order of $\log|\cC|$ many examples are required. This is the first paper that presents a concrete and easily constructible family of concept classes which separates the No-Clash from the recursive model of teaching by more than a constant factor. The separation by a logarithmic factor is remarkable because the recursive teaching dimension is known to be bounded by $\log |\cC|$ for any concept class $\cC$. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 9 pages, 0 figures

arXiv:2205.02792 [pdf, ps, other]

Tournaments, Johnson Graphs, and NC-Teaching

Authors: Hans U. Simon

Abstract: Quite recently a teaching model, called "No-Clash Teaching" or simply "NC-Teaching", had been suggested that is provably optimal in the following strong sense. First, it satisfies Goldman and Matthias' collusion-freeness condition. Second, the NC-teaching dimension (= NCTD) is smaller than or equal to the teaching dimension with respect to any other collusion-free teaching model. It has also been… ▽ More Quite recently a teaching model, called "No-Clash Teaching" or simply "NC-Teaching", had been suggested that is provably optimal in the following strong sense. First, it satisfies Goldman and Matthias' collusion-freeness condition. Second, the NC-teaching dimension (= NCTD) is smaller than or equal to the teaching dimension with respect to any other collusion-free teaching model. It has also been shown that any concept class which has NC-teaching dimension $d$ and is defined over a domain of size $n$ can have at most $2^d \binom{n}{d}$ concepts. The main results in this paper are as follows. First, we characterize the maximum concept classes of NC-teaching dimension $1$ as classes which are induced by tournaments (= complete oriented graphs) in a very natural way. Second, we show that there exists a family $(\cC_n)_{n\ge1}$ of concept classes such that the well known recursive teaching dimension (= RTD) of $\cC_n$ grows logarithmically in $n = |\cC_n|$ while, for every $n\ge1$, the NC-teaching dimension of $\cC_n$ equals $1$. Since the recursive teaching dimension of a finite concept class $\cC$ is generally bounded $\log|\cC|$, the family $(\cC_n)_{n\ge1}$ separates RTD from NCTD in the most striking way. The proof of existence of the family $(\cC_n)_{n\ge1}$ makes use of the probabilistic method and random tournaments. Third, we improve the afore-mentioned upper bound $2^d\binom{n}{d}$ by a factor of order $\sqrt{d}$. The verification of the superior bound makes use of Johnson graphs and maximum subgraphs not containing large narrow cliques. △ Less

Submitted 5 May, 2022; originally announced May 2022.

Comments: 12 pages, 0 figures

ACM Class: G.2.1

arXiv:1903.04012 [pdf, other]

Optimal Collusion-Free Teaching

Authors: David Kirkpatrick, Hans U. Simon, Sandra Zilles

Abstract: Formal models of learning from teachers need to respect certain criteria to avoid collusion. The most commonly accepted notion of collusion-freeness was proposed by Goldman and Mathias (1996), and various teaching models obeying their criterion have been studied. For each model $M$ and each concept class $\mathcal{C}$, a parameter $M$-$\mathrm{TD}(\mathcal{C})$ refers to the teaching dimension of… ▽ More Formal models of learning from teachers need to respect certain criteria to avoid collusion. The most commonly accepted notion of collusion-freeness was proposed by Goldman and Mathias (1996), and various teaching models obeying their criterion have been studied. For each model $M$ and each concept class $\mathcal{C}$, a parameter $M$-$\mathrm{TD}(\mathcal{C})$ refers to the teaching dimension of concept class $\mathcal{C}$ in model $M$---defined to be the number of examples required for teaching a concept, in the worst case over all concepts in $\mathcal{C}$. This paper introduces a new model of teaching, called no-clash teaching, together with the corresponding parameter $\mathrm{NCTD}(\mathcal{C})$. No-clash teaching is provably optimal in the strong sense that, given any concept class $\mathcal{C}$ and any model $M$ obeying Goldman and Mathias's collusion-freeness criterion, one obtains $\mathrm{NCTD}(\mathcal{C})\le M$-$\mathrm{TD}(\mathcal{C})$. We also study a corresponding notion $\mathrm{NCTD}^+$ for the case of learning from positive data only, establish useful bounds on $\mathrm{NCTD}$ and $\mathrm{NCTD}^+$, and discuss relations of these parameters to the VC-dimension and to sample compression. In addition to formulating an optimal model of collusion-free teaching, our main results are on the computational complexity of deciding whether $\mathrm{NCTD}^+(\mathcal{C})=k$ (or $\mathrm{NCTD}(\mathcal{C})=k$) for given $\mathcal{C}$ and $k$. We show some such decision problems to be equivalent to the existence question for certain constrained matchings in bipartite graphs. Our NP-hardness results for the latter are of independent interest in the study of constrained graph matchings. △ Less

Submitted 10 March, 2019; originally announced March 2019.

Comments: 26 pages and 6 figures. This is an expanded version of a similarly titled paper to appear in Proceedings of Machine Learning Research (ALT 2019), vol. 98, 2019

ACM Class: I.2.6

arXiv:1710.04533 [pdf, ps, other]

On the Containment Problem for Linear Sets

Authors: Hans U. Simon

Abstract: It is well known that the containment problem (as well as the equivalence problem) for semilinear sets is $\log$-complete in $Π_2^p$. It had been shown quite recently that already the containment problem for multi-dimensional linear sets is $\log$-complete in $Π_2^p$ (where hardness even holds for a unary encoding of the numerical input parameters). In this paper, we show that already the containm… ▽ More It is well known that the containment problem (as well as the equivalence problem) for semilinear sets is $\log$-complete in $Π_2^p$. It had been shown quite recently that already the containment problem for multi-dimensional linear sets is $\log$-complete in $Π_2^p$ (where hardness even holds for a unary encoding of the numerical input parameters). In this paper, we show that already the containment problem for $1$-dimensional linear sets (with binary encoding of the numerical input parameters) is $\log$-hard (and therefore also $\log$-complete) in $Π_2^p$. However, combining both restrictions (dimension $1$ and unary encoding), the problem becomes solvable in polynomial time. △ Less

Submitted 20 February, 2018; v1 submitted 12 October, 2017; originally announced October 2017.

Comments: 15 pages

MSC Class: 68Q17

arXiv:1702.02047 [pdf, ps, other]

Preference-based Teaching

Authors: Ziyuan Gao, Christoph Ries, Hans Ulrich Simon, Sandra Zilles

Abstract: We introduce a new model of teaching named "preference-based teaching" and a corresponding complexity parameter---the preference-based teaching dimension (PBTD)---representing the worst-case number of examples needed to teach any concept in a given concept class. Although the PBTD coincides with the well-known recursive teaching dimension (RTD) on finite classes, it is radically different on infin… ▽ More We introduce a new model of teaching named "preference-based teaching" and a corresponding complexity parameter---the preference-based teaching dimension (PBTD)---representing the worst-case number of examples needed to teach any concept in a given concept class. Although the PBTD coincides with the well-known recursive teaching dimension (RTD) on finite classes, it is radically different on infinite ones: the RTD becomes infinite already for trivial infinite classes (such as half-intervals) whereas the PBTD evaluates to reasonably small values for a wide collection of infinite classes including classes consisting of so-called closed sets w.r.t. a given closure operator, including various classes related to linear sets over $\mathbb{N}_0$ (whose RTD had been studied quite recently) and including the class of Euclidean half-spaces. On top of presenting these concrete results, we provide the reader with a theoretical framework (of a combinatorial flavor) which helps to derive bounds on the PBTD. △ Less

Submitted 8 February, 2017; v1 submitted 6 February, 2017; originally announced February 2017.

Comments: 35 pages

Showing 1–12 of 12 results for author: Simon, H U