-
Assessing Generative Models for Structured Data
Authors:
Reilly Cannon,
Nicolette M. Laird,
Caesar Vazquez,
Andy Lin,
Amy Wagler,
Tony Chiang
Abstract:
Synthetic tabular data generation has emerged as a promising method to address limited data availability and privacy concerns. With the sharp increase in the performance of large language models in recent years, researchers have been interested in applying these models to the generation of tabular data. However, little is known about the quality of the generated tabular data from large language mo…
▽ More
Synthetic tabular data generation has emerged as a promising method to address limited data availability and privacy concerns. With the sharp increase in the performance of large language models in recent years, researchers have been interested in applying these models to the generation of tabular data. However, little is known about the quality of the generated tabular data from large language models. The predominant method for assessing the quality of synthetic tabular data is the train-synthetic-test-real approach, where the artificial examples are compared to the original by how well machine learning models, trained separately on the real and synthetic sets, perform in some downstream tasks. This method does not directly measure how closely the distribution of generated data approximates that of the original. This paper introduces rigorous methods for directly assessing synthetic tabular data against real data by looking at inter-column dependencies within the data. We find that large language models (GPT-2), both when queried via few-shot prompting and when fine-tuned, and GAN (CTGAN) models do not produce data with dependencies that mirror the original real data. Results from this study can inform future practice in synthetic data generation to improve data quality.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
On full-separating sets in graphs
Authors:
Dipayan Chakraborty,
Annegret K. Wagler
Abstract:
Several different types of identification problems have been already studied in the literature, where the objective is to distinguish any two vertices of a graph by their unique neighborhoods in a suitably chosen dominating or total-dominating set of the graph, often referred to as a \emph{code}. To study such problems under a unifying point of view, reformulations of the already studied problems…
▽ More
Several different types of identification problems have been already studied in the literature, where the objective is to distinguish any two vertices of a graph by their unique neighborhoods in a suitably chosen dominating or total-dominating set of the graph, often referred to as a \emph{code}. To study such problems under a unifying point of view, reformulations of the already studied problems in terms of covering problems in suitably constructed hypergraphs have been provided. Analyzing these hypergraph representations, we introduce a new separation property, called \emph{full-separation}, which has not yet been considered in the literature so far. We study it in combination with both domination and total-domination, and call the resulting codes \emph{full-separating-dominating codes} (or \emph{FD-codes} for short) and \emph{full-separating-total-dominating codes} (or \emph{FTD-codes} for short), respectively. We address the conditions for the existence of FD- and FTD-codes, bounds for their size and their relation to codes of the other types. We show that the problems of determining an FD- or an FTD-code of minimum cardinality in a graph is NP-hard. We also show that the cardinalities of minimum FD- and FTD-codes differ by at most one, but that it is NP-complete to decide if they are equal for a given graph in general. We find the exact values of minimum cardinalities of the FD- and FTD-codes on some familiar graph classes like paths, cycles, half-graphs and spiders. This helps us compare the two codes with other codes on these graph families thereby exhibiting extremal cases for several lower bounds.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Open-separating dominating codes in graphs
Authors:
Dipayan Chakraborty,
Annegret K. Wagler
Abstract:
Using dominating sets to separate vertices of graphs is a well-studied problem in the larger domain of identification problems. In such problems, the objective is to choose a suitable dominating set $C$ of a graph $G$ such that the neighbourhoods of all vertices of $G$ have distinct intersections with $C$. Such a dominating and separating set $C$ is often referred to as a \emph{code} in the litera…
▽ More
Using dominating sets to separate vertices of graphs is a well-studied problem in the larger domain of identification problems. In such problems, the objective is to choose a suitable dominating set $C$ of a graph $G$ such that the neighbourhoods of all vertices of $G$ have distinct intersections with $C$. Such a dominating and separating set $C$ is often referred to as a \emph{code} in the literature. Depending on the types of dominating and separating sets used, various problems arise under various names in the literature. In this paper, we introduce a new problem in the same realm of identification problems whereby the code, called \emph{open-separating dominating code}, or \emph{OSD-code} for short, is a dominating set and uses open neighbourhoods for separating vertices. The paper studies the fundamental properties concerning the existence, hardness and minimality of OSD-codes. Due to the emergence of a close and yet difficult to establish relation of the OSD-codes with another well-studied code in the literature called open locating dominating codes, or OLD-codes for short, we compare the two on various graph families. Finally, we also provide an equivalent reformulation of the problem of finding OSD-codes of a graph as a covering problem in a suitable hypergraph and discuss the polyhedra associated with OSD-codes, again in relation to OLD-codes of some graph families already studied in this context.
△ Less
Submitted 3 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
Progress towards the two-thirds conjecture on locating-total dominating sets
Authors:
Dipayan Chakraborty,
Florent Foucaud,
Anni Hakanen,
Michael A. Henning,
Annegret K. Wagler
Abstract:
We study upper bounds on the size of optimum locating-total dominating sets in graphs. A set $S$ of vertices of a graph $G$ is a locating-total dominating set if every vertex of $G$ has a neighbor in $S$, and if any two vertices outside $S$ have distinct neighborhoods within $S$. The smallest size of such a set is denoted by $γ^L_t(G)$. It has been conjectured that $γ^L_t(G)\leq\frac{2n}{3}$ holds…
▽ More
We study upper bounds on the size of optimum locating-total dominating sets in graphs. A set $S$ of vertices of a graph $G$ is a locating-total dominating set if every vertex of $G$ has a neighbor in $S$, and if any two vertices outside $S$ have distinct neighborhoods within $S$. The smallest size of such a set is denoted by $γ^L_t(G)$. It has been conjectured that $γ^L_t(G)\leq\frac{2n}{3}$ holds for every twin-free graph $G$ of order $n$ without isolated vertices. We prove that the conjecture holds for cobipartite graphs, split graphs, block graphs and subcubic graphs.
△ Less
Submitted 7 August, 2024; v1 submitted 25 November, 2022;
originally announced November 2022.
-
On three domination-based identification problems in block graphs
Authors:
Dipayan Chakraborty,
Florent Foucaud,
Aline Parreau,
Annegret K. Wagler
Abstract:
The problems of determining the minimum-sized \emph{identifying}, \emph{locating-dominating} and \emph{open locating-dominating codes} of an input graph are special search problems that are challenging from both theoretical and computational viewpoints. In these problems, one selects a dominating set $C$ of a graph $G$ such that the vertices of a chosen subset of $V(G)$ (i.e. either…
▽ More
The problems of determining the minimum-sized \emph{identifying}, \emph{locating-dominating} and \emph{open locating-dominating codes} of an input graph are special search problems that are challenging from both theoretical and computational viewpoints. In these problems, one selects a dominating set $C$ of a graph $G$ such that the vertices of a chosen subset of $V(G)$ (i.e. either $V(G)\setminus C$ or $V(G)$ itself) are uniquely determined by their neighborhoods in $C$. A typical line of attack for these problems is to determine tight bounds for the minimum codes in various graphs classes. In this work, we present tight lower and upper bounds for all three types of codes for \emph{block graphs} (i.e. diamond-free chordal graphs). Our bounds are in terms of the number of maximal cliques (or \emph{blocks}) of a block graph and the order of the graph. Two of our upper bounds verify conjectures from the literature - with one of them being now proven for block graphs in this article. As for the lower bounds, we prove them to be linear in terms of both the number of blocks and the order of the block graph. We provide examples of families of block graphs whose minimum codes attain these bounds, thus showing each bound to be tight.
△ Less
Submitted 4 July, 2024; v1 submitted 23 November, 2018;
originally announced November 2018.
-
Fleet management for autonomous vehicles: Online PDP under special constraints
Authors:
Sahar Bsaybes,
Alain Quilliot,
Annegret K. Wagler
Abstract:
The VIPAFLEET project consists in developing models and algorithms for man- aging a fleet of Individual Public Autonomous Vehicles (VIPA). Hereby, we consider a fleet of cars distributed at specified stations in an industrial area to supply internal transportation, where the cars can be used in different modes of circulation (tram mode, elevator mode, taxi mode). One goal is to develop and impleme…
▽ More
The VIPAFLEET project consists in developing models and algorithms for man- aging a fleet of Individual Public Autonomous Vehicles (VIPA). Hereby, we consider a fleet of cars distributed at specified stations in an industrial area to supply internal transportation, where the cars can be used in different modes of circulation (tram mode, elevator mode, taxi mode). One goal is to develop and implement suitable algorithms for each mode in order to satisfy all the requests under an economic point of view by minimizing the total tour length. The innovative idea and challenge of the project is to develop and install a dynamic fleet management system that allows the operator to switch between the different modes within the different periods of the day according to the dynamic transportation demands of the users. We model the underlying online transportation system and propose a correspond- ing fleet management framework, to handle modes, demands and commands. We consider two modes of circulation, tram and elevator mode, propose for each mode appropriate on- line algorithms and evaluate their performance, both in terms of competitive analysis and practical behavior.
△ Less
Submitted 30 March, 2017;
originally announced March 2017.
-
Fleet management for autonomous vehicles
Authors:
Sahar Bsaybes,
Alain Quilliot,
Annegret K. Wagler
Abstract:
The VIPAFLEET project consists in developing models and algorithms for man- aging a fleet of Individual Public Autonomous Vehicles (VIPA). Hereby, we consider a fleet of cars distributed at specified stations in an industrial area to supply internal transportation, where the cars can be used in different modes of circulation (tram mode, elevator mode, taxi mode). One goal is to develop and impleme…
▽ More
The VIPAFLEET project consists in developing models and algorithms for man- aging a fleet of Individual Public Autonomous Vehicles (VIPA). Hereby, we consider a fleet of cars distributed at specified stations in an industrial area to supply internal transportation, where the cars can be used in different modes of circulation (tram mode, elevator mode, taxi mode). One goal is to develop and implement suitable algorithms for each mode in order to satisfy all the requests under an economic point of view by minimizing the total tour length or the makespan. The innovative idea and challenge of the project is to develop and install a dynamic fleet management system that allows the operator to switch between the differ- ent modes within the different periods of the day according to the dynamic transportation demands of the users. We model the underlying online transportation system and propose an according fleet management framework, to handle modes, demands and commands. We propose for each mode appropriate online algorithms and evaluate their performance.
△ Less
Submitted 6 September, 2016;
originally announced September 2016.
-
ReOpt: an Algorithm with a Quality Guaranty for Solving the Static Relocation Problem
Authors:
Sahar Bsaybes,
Sven O. Krumke,
Alain Quilliot,
Annegret K. Wagler,
Jan-Thierry Wegener
Abstract:
In a carsharing system, a fleet of cars is distributed at stations in an urban area, customers can take and return cars at any time and station. For operating such a system in a satisfactory way, the stations have to keep a good ratio between the numbers of free places and cars in each station. This leads to the problem of relocating cars between stations, which can be modeled within the framework…
▽ More
In a carsharing system, a fleet of cars is distributed at stations in an urban area, customers can take and return cars at any time and station. For operating such a system in a satisfactory way, the stations have to keep a good ratio between the numbers of free places and cars in each station. This leads to the problem of relocating cars between stations, which can be modeled within the framework of a metric task system. In this paper, we focus on the Static Relocation Problem, where the system has to be set into a certain state, outgoing from the current state. We present a combinatorial approach and provide approximation factors for several different situations.
△ Less
Submitted 9 November, 2015;
originally announced November 2015.
-
Two Flow-Based Approaches for the Static Relocation Problem in Carsharing Systems
Authors:
Sahar Bsaybes,
Alain Quilliot,
Annegret K. Wagler,
Jan-Thierry Wegener
Abstract:
In a carsharing system, a fleet of cars is distributed at stations in an urban area, customers can take and return cars at any time and station. For operating such a system in a satisfactory way, the stations have to keep a good ratio between the numbers of free places and cars in each station. This leads to the problem of relocating cars between stations, which can be modeled within the framework…
▽ More
In a carsharing system, a fleet of cars is distributed at stations in an urban area, customers can take and return cars at any time and station. For operating such a system in a satisfactory way, the stations have to keep a good ratio between the numbers of free places and cars in each station. This leads to the problem of relocating cars between stations, which can be modeled within the framework of a metric task system. In this paper, we focus on the Static Relocation Problem, where the system has to be set into a certain state, outgoing from the current state. We present two approaches to solve this problem, a fast heuristic approach and an exact integer programming based method using flows in time-expanded networks, and provide some computational results.
△ Less
Submitted 9 November, 2015;
originally announced November 2015.
-
Polyhedral studies of vertex coloring problems: The asymmetric representatives formulation
Authors:
Victor Campos,
Ricardo C. Corrêa,
Diego Delle Donne,
Javier Marenco,
Annegret Wagler
Abstract:
Despite the fact that some vertex coloring problems are polynomially solvable on certain graph classes, most of these problems are not "under control" from a polyhedral point of view. The equivalence between \emph{optimization} and \emph{polyhedral separation} suggests that, for these problems, there must exist formulations admitting some elegant characterization for the polytopes associated to th…
▽ More
Despite the fact that some vertex coloring problems are polynomially solvable on certain graph classes, most of these problems are not "under control" from a polyhedral point of view. The equivalence between \emph{optimization} and \emph{polyhedral separation} suggests that, for these problems, there must exist formulations admitting some elegant characterization for the polytopes associated to them. Therefore, it is interesting to study known formulations for vertex coloring with the goal of finding such characterizations. In this work we study the asymmetric representatives formulation and we show that the corresponding coloring polytope, for a given graph $G$, can be interpreted as the stable set polytope of another graph obtained from $G$. This result allows us to derive complete characterizations for the corresponding coloring polytope for some families of graphs, based on known complete characterizations for the stable set polytope.
△ Less
Submitted 28 August, 2015;
originally announced September 2015.
-
Automatic Network Reconstruction using ASP
Authors:
Max Ostrowski,
Torsten Schaub,
Markus Durzinsky,
Wolfgang Marwan,
Annegret Wagler
Abstract:
Building biological models by inferring functional dependencies from experimental data is an im- portant issue in Molecular Biology. To relieve the biologist from this traditionally manual process, various approaches have been proposed to increase the degree of automation. However, available ap- proaches often yield a single model only, rely on specific assumptions, and/or use dedicated, heuris- t…
▽ More
Building biological models by inferring functional dependencies from experimental data is an im- portant issue in Molecular Biology. To relieve the biologist from this traditionally manual process, various approaches have been proposed to increase the degree of automation. However, available ap- proaches often yield a single model only, rely on specific assumptions, and/or use dedicated, heuris- tic algorithms that are intolerant to changing circumstances or requirements in the view of the rapid progress made in Biotechnology. Our aim is to provide a declarative solution to the problem by ap- peal to Answer Set Programming (ASP) overcoming these difficulties. We build upon an existing approach to Automatic Network Reconstruction proposed by part of the authors. This approach has firm mathematical foundations and is well suited for ASP due to its combinatorial flavor providing a characterization of all models explaining a set of experiments. The usage of ASP has several ben- efits over the existing heuristic algorithms. First, it is declarative and thus transparent for biological experts. Second, it is elaboration tolerant and thus allows for an easy exploration and incorporation of biological constraints. Third, it allows for exploring the entire space of possible models. Finally, our approach offers an excellent performance, matching existing, special-purpose systems.
△ Less
Submitted 28 July, 2011;
originally announced July 2011.