Search | arXiv e-print repository

A Competitive Posted-Price Mechanism for Online Budget-Feasible Auctions

Authors: Andreas Charalampopoulos, Dimitris Fotakis, Panagiotis Patsilinakos, Thanos Tolias

Abstract: We consider online procurement auctions, where the agents arrive sequentially, in random order, and have private costs for their services. The buyer aims to maximize a monotone submodular value function for the subset of agents whose services are procured, subject to a budget constraint on their payments. We consider a posted-price setting where upon each agent's arrival, the buyer decides on a pa… ▽ More We consider online procurement auctions, where the agents arrive sequentially, in random order, and have private costs for their services. The buyer aims to maximize a monotone submodular value function for the subset of agents whose services are procured, subject to a budget constraint on their payments. We consider a posted-price setting where upon each agent's arrival, the buyer decides on a payment offered to them. The agent accepts or rejects the offer, depending on whether the payment exceeds their cost, without revealing any other information about their private costs whatsoever. We present a randomized online posted-price mechanism with constant competitive ratio, thus resolving the main open question of (Badanidiyuru, Kleinberg and Singer, EC 2012). Posted-price mechanisms for online procurement typically operate by learning an estimation of the optimal value, denoted as OPT, and using it to determine the payments offered to the agents. The main challenge is to learn OPT within a constant factor from the agents' accept / reject responses to the payments offered. Our approach is based on an online test of whether our estimation is too low compared against OPT and a carefully designed adaptive search that gradually refines our estimation. △ Less

Submitted 13 April, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.13653 [pdf, other]

A Query-Driven Approach to Space-Efficient Range Searching

Authors: Dimitris Fotakis, Andreas Kalavas, Ioannis Psarros

Abstract: We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle, and must be selected such that it optimizes a meaningful performance parameter on expectation. Our first contribution is to show that a near-linear sample of q… ▽ More We initiate a study of a query-driven approach to designing partition trees for range-searching problems. Our model assumes that a data structure is to be built for an unknown query distribution that we can access through a sampling oracle, and must be selected such that it optimizes a meaningful performance parameter on expectation. Our first contribution is to show that a near-linear sample of queries allows the construction of a partition tree with a near-optimal expected number of nodes visited during querying. We enhance this approach by treating node processing as a classification problem, leveraging fast classifiers like shallow neural networks to obtain experimentally efficient query times. Our second contribution is to develop partition trees using sparse geometric separators. Our preprocessing algorithm, based on a sample of queries, builds a balanced tree with nodes associated with separators that minimize query stabs on expectation; this yields both fast processing of each node and a small number of visited nodes, significantly reducing query time. △ Less

Submitted 19 February, 2025; originally announced February 2025.

Comments: 16 pages, 2 figures

arXiv:2502.00841 [pdf, other]

Polynomial Time Learning-Augmented Algorithms for NP-hard Permutation Problems

Authors: Evripidis Bampis, Bruno Escoffier, Dimitris Fotakis, Panagiotis Patsilinakos, Michalis Xefteris

Abstract: We consider a learning-augmented framework for NP-hard permutation problems. The algorithm has access to predictions telling, given a pair $u,v$ of elements, whether $u$ is before $v$ or not in an optimal solution. Building on the work of Braverman and Mossel (SODA 2008), we show that for a class of optimization problems including scheduling, network design and other graph permutation problems, th… ▽ More We consider a learning-augmented framework for NP-hard permutation problems. The algorithm has access to predictions telling, given a pair $u,v$ of elements, whether $u$ is before $v$ or not in an optimal solution. Building on the work of Braverman and Mossel (SODA 2008), we show that for a class of optimization problems including scheduling, network design and other graph permutation problems, these predictions allow to solve them in polynomial time with high probability, provided that predictions are true with probability at least $1/2+ε$. Moreover, this can be achieved with a parsimonious access to the predictions. △ Less

Submitted 2 February, 2025; originally announced February 2025.

arXiv:2410.23830 [pdf, other]

Reducing Oversmoothing through Informed Weight Initialization in Graph Neural Networks

Authors: Dimitrios Kelesis, Dimitris Fotakis, Georgios Paliouras

Abstract: In this work, we generalize the ideas of Kaiming initialization to Graph Neural Networks (GNNs) and propose a new scheme (G-Init) that reduces oversmoothing, leading to very good results in node and graph classification tasks. GNNs are commonly initialized using methods designed for other types of Neural Networks, overlooking the underlying graph topology. We analyze theoretically the variance of… ▽ More In this work, we generalize the ideas of Kaiming initialization to Graph Neural Networks (GNNs) and propose a new scheme (G-Init) that reduces oversmoothing, leading to very good results in node and graph classification tasks. GNNs are commonly initialized using methods designed for other types of Neural Networks, overlooking the underlying graph topology. We analyze theoretically the variance of signals flowing forward and gradients flowing backward in the class of convolutional GNNs. We then simplify our analysis to the case of the GCN and propose a new initialization method. Our results indicate that the new method (G-Init) reduces oversmoothing in deep GNNs, facilitating their effective use. Experimental validation supports our theoretical findings, demonstrating the advantages of deep networks in scenarios with no feature information for unlabeled nodes (i.e., ``cold start'' scenario). △ Less

Submitted 31 October, 2024; originally announced October 2024.

arXiv:2410.13416 [pdf, other]

Partially Trained Graph Convolutional Networks Resist Oversmoothing

Authors: Dimitrios Kelesis, Dimitris Fotakis, Georgios Paliouras

Abstract: In this work we investigate an observation made by Kipf \& Welling, who suggested that untrained GCNs can generate meaningful node embeddings. In particular, we investigate the effect of training only a single layer of a GCN, while keeping the rest of the layers frozen. We propose a basis on which the effect of the untrained layers and their contribution to the generation of embeddings can be pred… ▽ More In this work we investigate an observation made by Kipf \& Welling, who suggested that untrained GCNs can generate meaningful node embeddings. In particular, we investigate the effect of training only a single layer of a GCN, while keeping the rest of the layers frozen. We propose a basis on which the effect of the untrained layers and their contribution to the generation of embeddings can be predicted. Moreover, we show that network width influences the dissimilarity of node embeddings produced after the initial node features pass through the untrained part of the model. Additionally, we establish a connection between partially trained GCNs and oversmoothing, showing that they are capable of reducing it. We verify our theoretical results experimentally and show the benefits of using deep networks that resist oversmoothing, in a ``cold start'' scenario, where there is a lack of feature information for unlabeled nodes. △ Less

Submitted 17 October, 2024; originally announced October 2024.

arXiv:2408.11755 [pdf, ps, other]

On the Distortion of Committee Election with 1-Euclidean Preferences and Few Distance Queries

Authors: Dimitris Fotakis, Laurent Gourvès, Panagiotis Patsilinakos

Abstract: We consider committee election of $k \geq 2$ (out of $m \geq k+1$) candidates, where the voters and the candidates are associated with locations on the real line. Each voter's cardinal preferences over candidates correspond to her distance to the candidate locations, and each voter's cardinal preferences over committees is defined as her distance to the nearest candidate elected in the committee.… ▽ More We consider committee election of $k \geq 2$ (out of $m \geq k+1$) candidates, where the voters and the candidates are associated with locations on the real line. Each voter's cardinal preferences over candidates correspond to her distance to the candidate locations, and each voter's cardinal preferences over committees is defined as her distance to the nearest candidate elected in the committee. We consider a setting where the true distances and the locations are unknown. We can nevertheless have access to degraded information which consists of an order of candidates for each voter. We investigate the best possible distortion (a worst-case performance criterion) wrt. the social cost achieved by deterministic committee election rules based on ordinal preferences submitted by $n$ voters and few additional distance queries. For $k = 2$, we achieve bounded distortion without any distance queries; we show that the distortion is $3$ for $m = 3$, and that the best possible distortion achieved by deterministic algorithms is at least $n-1$ and at most $n+1$, for any $m \geq 4$. For any $k \geq 3$, we show that the best possible distortion of any deterministic algorithm that uses at most $k-3$ distance queries cannot be bounded by any function of $n$, $m$ and $k$. We present deterministic algorithms for $k$-committee election with distortion of $O(n)$ with $O(k)$ distance queries and $O(1)$ with $O(k \log n)$ distance queries. △ Less

Submitted 8 September, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

arXiv:2405.18921 [pdf, other]

GLANCE: Global Actions in a Nutshell for Counterfactual Explainability

Authors: Loukas Kavouras, Eleni Psaroudaki, Konstantinos Tsopelas, Dimitrios Rontogiannis, Nikolaos Theologitis, Dimitris Sacharidis, Giorgos Giannopoulos, Dimitrios Tomaras, Kleopatra Markou, Dimitrios Gunopulos, Dimitris Fotakis, Ioannis Emiris

Abstract: The widespread deployment of machine learning systems in critical real-world decision-making applications has highlighted the urgent need for counterfactual explainability methods that operate effectively. Global counterfactual explanations, expressed as actions to offer recourse, aim to provide succinct explanations and insights applicable to large population subgroups. Effectiveness is measured… ▽ More The widespread deployment of machine learning systems in critical real-world decision-making applications has highlighted the urgent need for counterfactual explainability methods that operate effectively. Global counterfactual explanations, expressed as actions to offer recourse, aim to provide succinct explanations and insights applicable to large population subgroups. Effectiveness is measured by the fraction of the population that is provided recourse, ensuring that the actions benefit as many individuals as possible. Keeping the cost of actions low ensures the proposed recourse actions remain practical and actionable. Limiting the number of actions that provide global counterfactuals is essential to maximize interpretability. The primary challenge, therefore, is balancing these trade-offs, i.e., maximizing effectiveness, minimizing cost, while maintaining a small number of actions. We introduce GLANCE, a versatile and adaptive framework, comprising two algorithms, that allows the careful balancing of the trade-offs among the three key objectives, with the size objective functioning as a tunable parameter to keep the actions few and easy to interpret. C-GLANCE employs a clustering approach that considers both the feature space and the space of counterfactual actions, thereby accounting for the distribution of points in a way that aligns with the structure of the model. T-GLANCE provides additional features to enhance flexibility. It employs a tree-based approach, that allows users to specify split features, to build a decision tree with a single counterfactual action at each node that can be used as a subgroup policy. Our extensive experimental evaluation demonstrates that our method consistently shows greater robustness and performance compared to existing methods across various datasets and models. △ Less

Submitted 18 October, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.02153 [pdf, other]

doi 10.1016/j.ascom.2024.100823

Reconstructing the mid-infrared spectra of galaxies using ultraviolet to submillimeter photometry and Deep Generative Networks

Authors: Agapi Rissaki, Orestis Pavlou, Dimitris Fotakis, Vicky Papadopoulou Lesta, Andreas Efstathiou

Abstract: The mid-infrared spectra of galaxies are rich in features such as the Polycyclic Aromatic Hydrocarbon (PAH) and silicate dust features which give valuable information about the physics of galaxies and their evolution. For example they can provide information about the relative contribution of star formation and accretion from a supermassive black hole to the power output of galaxies. However, the… ▽ More The mid-infrared spectra of galaxies are rich in features such as the Polycyclic Aromatic Hydrocarbon (PAH) and silicate dust features which give valuable information about the physics of galaxies and their evolution. For example they can provide information about the relative contribution of star formation and accretion from a supermassive black hole to the power output of galaxies. However, the mid-infrared spectra are currently available for a very small fraction of galaxies that have been detected in deep multi-wavelength surveys of the sky. In this paper we explore whether Deep Generative Network methods can be used to reconstruct mid-infrared spectra in the 5-35μm range using the limited multi-wavelength photometry in ~20 bands from the ultraviolet to the submillimeter which is typically available in extragalactic surveys. For this purpose we use simulated spectra computed with a combination of radiative transfer models for starbursts, active galactic nucleus (AGN) tori and host galaxies. We find that our method using Deep Generative Networks, namely Generative Adversarial Networks and Generative Latent Optimization models, can efficiently produce high quality reconstructions of mid-infrared spectra in ~70% of the cases. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Published in Astronomy and Computing (Volume 47, April 2024, 100823)

Journal ref: Astronomy and Computing 47 (2024) 100823

arXiv:2403.19419 [pdf, other]

doi 10.1109/ICDEW61823.2024.00032

Fairness in Ranking: Robustness through Randomization without the Protected Attribute

Authors: Andrii Kliachkin, Eleni Psaroudaki, Jakub Marecek, Dimitris Fotakis

Abstract: There has been great interest in fairness in machine learning, especially in relation to classification problems. In ranking-related problems, such as in online advertising, recommender systems, and HR automation, much work on fairness remains to be done. Two complications arise: first, the protected attribute may not be available in many applications. Second, there are multiple measures of fairne… ▽ More There has been great interest in fairness in machine learning, especially in relation to classification problems. In ranking-related problems, such as in online advertising, recommender systems, and HR automation, much work on fairness remains to be done. Two complications arise: first, the protected attribute may not be available in many applications. Second, there are multiple measures of fairness of rankings, and optimization-based methods utilizing a single measure of fairness of rankings may produce rankings that are unfair with respect to other measures. In this work, we propose a randomized method for post-processing rankings, which do not require the availability of the protected attribute. In an extensive numerical study, we show the robustness of our methods with respect to P-Fairness and effectiveness with respect to Normalized Discounted Cumulative Gain (NDCG) from the baseline ranking, improving on previously proposed methods. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Journal ref: 2024 IEEE 40th International Conference on Data Engineering Workshops

arXiv:2310.05309 [pdf, other]

Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods

Authors: Constantine Caramanis, Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos

Abstract: Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for an… ▽ More Deep Neural Networks and Reinforcement Learning methods have empirically shown great promise in tackling challenging combinatorial problems. In those methods a deep neural network is used as a solution generator which is then trained by gradient-based methods (e.g., policy gradient) to successively obtain better solution distributions. In this work we introduce a novel theoretical framework for analyzing the effectiveness of such methods. We ask whether there exist generative models that (i) are expressive enough to generate approximately optimal solutions; (ii) have a tractable, i.e, polynomial in the size of the input, number of parameters; (iii) their optimization landscape is benign in the sense that it does not contain sub-optimal stationary points. Our main contribution is a positive answer to this question. Our result holds for a broad class of combinatorial problems including Max- and Min-Cut, Max-$k$-CSP, Maximum-Weight-Bipartite-Matching, and the Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla gradient descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points. △ Less

Submitted 6 November, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

arXiv:2306.14978 [pdf, other]

Fairness Aware Counterfactuals for Subgroups

Authors: Loukas Kavouras, Konstantinos Tsopelas, Giorgos Giannopoulos, Dimitris Sacharidis, Eleni Psaroudaki, Nikolaos Theologitis, Dimitrios Rontogiannis, Dimitris Fotakis, Ioannis Emiris

Abstract: In this work, we present Fairness Aware Counterfactuals for Subgroups (FACTS), a framework for auditing subgroup fairness through counterfactual explanations. We start with revisiting (and generalizing) existing notions and introducing new, more refined notions of subgroup fairness. We aim to (a) formulate different aspects of the difficulty of individuals in certain subgroups to achieve recourse,… ▽ More In this work, we present Fairness Aware Counterfactuals for Subgroups (FACTS), a framework for auditing subgroup fairness through counterfactual explanations. We start with revisiting (and generalizing) existing notions and introducing new, more refined notions of subgroup fairness. We aim to (a) formulate different aspects of the difficulty of individuals in certain subgroups to achieve recourse, i.e. receive the desired outcome, either at the micro level, considering members of the subgroup individually, or at the macro level, considering the subgroup as a whole, and (b) introduce notions of subgroup fairness that are robust, if not totally oblivious, to the cost of achieving recourse. We accompany these notions with an efficient, model-agnostic, highly parameterizable, and explainable framework for evaluating subgroup fairness. We demonstrate the advantages, the wide applicability, and the efficiency of our approach through a thorough experimental evaluation of different benchmark datasets. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2211.12868 [pdf, other]

Perfect Sampling from Pairwise Comparisons

Authors: Dimitris Fotakis, Alkis Kalavasis, Christos Tzamos

Abstract: In this work, we study how to efficiently obtain perfect samples from a discrete distribution $\mathcal{D}$ given access only to pairwise comparisons of elements of its support. Specifically, we assume access to samples $(x, S)$, where $S$ is drawn from a distribution over sets $\mathcal{Q}$ (indicating the elements being compared), and $x$ is drawn from the conditional distribution… ▽ More In this work, we study how to efficiently obtain perfect samples from a discrete distribution $\mathcal{D}$ given access only to pairwise comparisons of elements of its support. Specifically, we assume access to samples $(x, S)$, where $S$ is drawn from a distribution over sets $\mathcal{Q}$ (indicating the elements being compared), and $x$ is drawn from the conditional distribution $\mathcal{D}_S$ (indicating the winner of the comparison) and aim to output a clean sample $y$ distributed according to $\mathcal{D}$. We mainly focus on the case of pairwise comparisons where all sets $S$ have size 2. We design a Markov chain whose stationary distribution coincides with $\mathcal{D}$ and give an algorithm to obtain exact samples using the technique of Coupling from the Past. However, the sample complexity of this algorithm depends on the structure of the distribution $\mathcal{D}$ and can be even exponential in the support of $\mathcal{D}$ in many natural scenarios. Our main contribution is to provide an efficient exact sampling algorithm whose complexity does not depend on the structure of $\mathcal{D}$. To this end, we give a parametric Markov chain that mixes significantly faster given a good approximation to the stationary distribution. We can obtain such an approximation using an efficient learning from pairwise comparisons algorithm (Shah et al., JMLR 17, 2016). Our technique for speeding up sampling from a Markov chain whose stationary distribution is approximately known is simple, general and possibly of independent interest. △ Less

Submitted 25 February, 2023; v1 submitted 23 November, 2022; originally announced November 2022.

arXiv:2208.11787 [pdf, other]

Sampling and Optimal Preference Elicitation in Simple Mechanisms

Authors: Ioannis Anagnostides, Dimitris Fotakis, Panagiotis Patsilinakos

Abstract: In this work we are concerned with the design of efficient mechanisms while eliciting limited information from the agents. First, we study the performance of sampling approximations in facility location games. Our key result is to show that for any $ε> 0$, a sample of size $c(ε) = Θ(1/ε^2)$ yields in expectation a $1 + ε$ approximation with respect to the optimal social cost of the generalized med… ▽ More In this work we are concerned with the design of efficient mechanisms while eliciting limited information from the agents. First, we study the performance of sampling approximations in facility location games. Our key result is to show that for any $ε> 0$, a sample of size $c(ε) = Θ(1/ε^2)$ yields in expectation a $1 + ε$ approximation with respect to the optimal social cost of the generalized median mechanism on the metric space $(\mathbb{R}^d, \| \cdot \|_1)$, while the number of agents $n \to \infty$. Moreover, we study a series of exemplar environments from auction theory through a communication complexity framework, measuring the expected number of bits elicited from the agents; we posit that any valuation can be expressed with $k$ bits, and we mainly assume that $k$ is independent of the number of agents $n$. In this context, we show that Vickrey's rule can be implemented with an expected communication of $1 + ε$ bits from an average bidder, for any $ε> 0$, asymptotically matching the trivial lower bound. As a corollary, we provide a compelling method to increment the price in an English auction. We also leverage our single-item format with an efficient encoding scheme to prove that the same communication bound can be recovered in the domain of additive valuations through simultaneous ascending auctions, assuming that the number of items is a constant. Finally, we propose an ascending-type multi-unit auction under unit demand bidders; our mechanism announces at every round two separate prices and is based on a sampling algorithm that performs approximate selection with limited communication, leading again to asymptotically optimal communication. Our results do not require any prior knowledge on the agents' valuations, and mainly follow from natural sampling techniques. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: Preliminary version appeared at SAGT 2020

arXiv:2208.11393 [pdf, other]

doi 10.1093/mnras/stac2444

Detecting and analysing the topology of the cosmic web with spatial clustering algorithms I: Methods

Authors: Dimitrios Kelesis, Spyros Basilakos, Vicky Papadopoulou Lesta, Dimitris Fotakis, Andreas Efstathiou

Abstract: In this paper we explore the use of spatial clustering algorithms as a new computational approach for modeling the cosmic web. We demonstrate that such algorithms are efficient in terms of computing time needed. We explore three distinct spatial methods which we suitably adjust for (i) detecting the topology of the cosmic web and (ii) categorizing various cosmic structures as voids, walls, cluster… ▽ More In this paper we explore the use of spatial clustering algorithms as a new computational approach for modeling the cosmic web. We demonstrate that such algorithms are efficient in terms of computing time needed. We explore three distinct spatial methods which we suitably adjust for (i) detecting the topology of the cosmic web and (ii) categorizing various cosmic structures as voids, walls, clusters and superclusters based on a variety of topological and physical criteria such as the physical distance between objects, their masses and local densities. The methods explored are (1) a new spatial method called Gravity Lattice ; (2) a modified version of another spatial clustering algorithm, the ABACUS; and (3) the well known spatial clustering algorithm HDBSCAN. We utilize HDBSCAN in order to detect cosmic structures and categorize them using their overdensity. We demonstrate that the ABACUS method can be combined with the classic DTFE method to obtain similar results in terms of the achieved accuracy with about an order of magnitude less computation time. To further solidify our claims, we draw insights from the computer science domain and compare the quality of the results with and without the application of our method. Finally, we further extend our experiments and verify their effectiveness by showing their ability to scale well with different cosmic web structures that formed at different redshifts. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Comments: Accepted for publication in Monthly Notices of the Royal Astronomical Society

arXiv:2208.10423 [pdf, other]

Graph Connectivity with Noisy Queries

Authors: Dimitris Fotakis, Evangelia Gergatsouli, Charilaos Pipis, Miltiadis Stouras, Christos Tzamos

Abstract: Graph connectivity is a fundamental combinatorial optimization problem that arises in many practical applications, where usually a spanning subgraph of a network is used for its operation. However, in the real world, links may fail unexpectedly deeming the networks non-operational, while checking whether a link is damaged is costly and possibly erroneous. After an event that has damaged an arbitra… ▽ More Graph connectivity is a fundamental combinatorial optimization problem that arises in many practical applications, where usually a spanning subgraph of a network is used for its operation. However, in the real world, links may fail unexpectedly deeming the networks non-operational, while checking whether a link is damaged is costly and possibly erroneous. After an event that has damaged an arbitrary subset of the edges, the network operator must find a spanning tree of the network using non-damaged edges by making as few checks as possible. Motivated by such questions, we study the problem of finding a spanning tree in a network, when we only have access to noisy queries of the form "Does edge e exist?". We design efficient algorithms, even when edges fail adversarially, for all possible error regimes; 2-sided error (where any answer might be erroneous), false positives (where "no" answers are always correct) and false negatives (where "yes" answers are always correct). In the first two regimes we provide efficient algorithms and give matching lower bounds for general graphs. In the False Negative case we design efficient algorithms for large interesting families of graphs (e.g. bounded treewidth, sparse). Using the previous results, we provide tight algorithms for the practically useful family of planar graphs in all error regimes. △ Less

Submitted 9 September, 2022; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: 22 pages, 3 figures

arXiv:2202.11199 [pdf, other]

Differentially Private Regression with Unbounded Covariates

Authors: Jason Milionis, Alkis Kalavasis, Dimitris Fotakis, Stratis Ioannidis

Abstract: We provide computationally efficient, differentially private algorithms for the classical regression settings of Least Squares Fitting, Binary Regression and Linear Regression with unbounded covariates. Prior to our work, privacy constraints in such regression settings were studied under strong a priori bounds on covariates. We consider the case of Gaussian marginals and extend recent differential… ▽ More We provide computationally efficient, differentially private algorithms for the classical regression settings of Least Squares Fitting, Binary Regression and Linear Regression with unbounded covariates. Prior to our work, privacy constraints in such regression settings were studied under strong a priori bounds on covariates. We consider the case of Gaussian marginals and extend recent differentially private techniques on mean and covariance estimation (Kamath et al., 2019; Karwa and Vadhan, 2018) to the sub-gaussian regime. We provide a novel technical analysis yielding differentially private algorithms for the above classical regression settings. Through the case of Binary Regression, we capture the fundamental and widely-studied models of logistic regression and linearly-separable SVMs, learning an unbiased estimate of the true regression vector, up to a scaling factor. △ Less

Submitted 19 February, 2022; originally announced February 2022.

arXiv:2111.06733 [pdf, ps, other]

A Constant-Factor Approximation for Generalized Malleable Scheduling under $M^\natural$-Concave Processing Speeds

Authors: Dimitris Fotakis, Jannik Matuschke, Orestis Papadigenopoulos

Abstract: In generalized malleable scheduling, jobs can be allocated and processed simultaneously on multiple machines so as to reduce the overall makespan of the schedule. The required processing time for each job is determined by the joint processing speed of the allocated machines. We study the case that processing speeds are job-dependent $M^\natural$-concave functions and provide a constant-factor appr… ▽ More In generalized malleable scheduling, jobs can be allocated and processed simultaneously on multiple machines so as to reduce the overall makespan of the schedule. The required processing time for each job is determined by the joint processing speed of the allocated machines. We study the case that processing speeds are job-dependent $M^\natural$-concave functions and provide a constant-factor approximation for this setting, significantly expanding the realm of functions for which such an approximation is possible. Further, we explore the connection between malleable scheduling and the problem of fairly allocating items to a set of agents with distinct utility functions, devising a black-box reduction that allows to obtain resource-augmented approximation algorithms for the latter. △ Less

Submitted 19 November, 2021; v1 submitted 12 November, 2021; originally announced November 2021.

arXiv:2111.06225 [pdf, other]

Assigning and Scheduling Generalized Malleable Jobs under Subadditive or Submodular Processing Speeds

Authors: Dimitris Fotakis, Jannik Matuschke, Orestis Papadigenopoulos

Abstract: Malleable scheduling is a model that captures the possibility of parallelization to expedite the completion of time-critical tasks. A malleable job can be allocated and processed simultaneously on multiple machines, occupying the same time interval on all these machines. We study a general version of this setting, in which the functions determining the joint processing speed of machines for a give… ▽ More Malleable scheduling is a model that captures the possibility of parallelization to expedite the completion of time-critical tasks. A malleable job can be allocated and processed simultaneously on multiple machines, occupying the same time interval on all these machines. We study a general version of this setting, in which the functions determining the joint processing speed of machines for a given job follow different discrete concavity assumptions (subadditivity, fractional subadditivity, submodularity, and matroid ranks). We show that under these assumptions the problem of scheduling malleable jobs at minimum makespan can be approximated by a considerably simpler assignment problem. Moreover, we provide efficient approximation algorithms for both the scheduling and the assignment problem, with increasingly stronger guarantees for increasingly stronger concavity assumptions, including a logarithmic approximation factor for the case of submodular processing speeds and a constant approximation factor when processing speeds are determined by matroid rank functions. Computational experiments indicate that our algorithms outperform the theoretical worst-case guarantees. △ Less

Submitted 28 March, 2022; v1 submitted 11 November, 2021; originally announced November 2021.

arXiv:2111.02749 [pdf, other]

Label Ranking through Nonparametric Regression

Authors: Dimitris Fotakis, Alkis Kalavasis, Eleni Psaroudaki

Abstract: Label Ranking (LR) corresponds to the problem of learning a hypothesis that maps features to rankings over a finite set of labels. We adopt a nonparametric regression approach to LR and obtain theoretical performance guarantees for this fundamental practical problem. We introduce a generative model for Label Ranking, in noiseless and noisy nonparametric regression settings, and provide sample comp… ▽ More Label Ranking (LR) corresponds to the problem of learning a hypothesis that maps features to rankings over a finite set of labels. We adopt a nonparametric regression approach to LR and obtain theoretical performance guarantees for this fundamental practical problem. We introduce a generative model for Label Ranking, in noiseless and noisy nonparametric regression settings, and provide sample complexity bounds for learning algorithms in both cases. In the noiseless setting, we study the LR problem with full rankings and provide computationally efficient algorithms using decision trees and random forests in the high-dimensional regime. In the noisy setting, we consider the more general cases of LR with incomplete and partial rankings from a statistical viewpoint and obtain sample complexity bounds using the One-Versus-One approach of multiclass classification. Finally, we complement our theoretical contributions with experiments, aiming to understand how the input regression noise affects the observed output. △ Less

Submitted 10 February, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.13324 [pdf, other]

Sampling Multiple Nodes in Large Networks: Beyond Random Walks

Authors: Omri Ben-Eliezer, Talya Eden, Joel Oren, Dimitris Fotakis

Abstract: Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of q… ▽ More Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of queries per sample scales linearly with the mixing time of the network, which can be prohibitive for large real-world networks. We propose a new method for sampling multiple nodes that bypasses the dependence in the mixing time by explicitly searching for less accessible components in the network. We test our approach on a variety of real-world and synthetic networks with up to tens of millions of nodes, demonstrating a query complexity improvement of up to $\times 20$ compared to the state of the art. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: To appear in 15th ACM International Conference on Web Search and Data Mining (WSDM 2022). Code available soon at: https://github.com/omribene/sampling-nodes

arXiv:2109.02184 [pdf, other]

Dimensionality, Coordination, and Robustness in Voting

Authors: Ioannis Anagnostides, Dimitris Fotakis, Panagiotis Patsilinakos

Abstract: We study the performance of voting mechanisms from a utilitarian standpoint, under the recently introduced framework of metric-distortion, offering new insights along three main lines. First, if $d$ represents the doubling dimension of the metric space, we show that the distortion of STV is $O(d \log \log m)$, where $m$ represents the number of candidates. For doubling metrics this implies an expo… ▽ More We study the performance of voting mechanisms from a utilitarian standpoint, under the recently introduced framework of metric-distortion, offering new insights along three main lines. First, if $d$ represents the doubling dimension of the metric space, we show that the distortion of STV is $O(d \log \log m)$, where $m$ represents the number of candidates. For doubling metrics this implies an exponential improvement over the lower bound for general metrics, and as a special case it effectively answers a question left open by Skowron and Elkind (AAAI '17) regarding the distortion of STV under low-dimensional Euclidean spaces. More broadly, this constitutes the first nexus between the performance of any voting rule and the "intrinsic dimensionality" of the underlying metric space. We also establish a nearly-matching lower bound, refining the construction of Skowron and Elkind. Moreover, motivated by the efficiency of STV, we investigate whether natural learning rules can lead to low-distortion outcomes. Specifically, we introduce simple, deterministic and decentralized exploration/exploitation dynamics, and we show that they converge to a candidate with $O(1)$ distortion. Finally, driven by applications in facility location games, we consider several refinements and extensions of the standard metric-setting. Namely, we prove that the deterministic mechanism recently introduced by Gkatzelis, Halpern, and Shah (FOCS '20) attains the optimal distortion bound of $2$ under ultra-metrics, while it also comes close to our lower bound under distances satisfying approximate triangle inequalities. △ Less

Submitted 24 March, 2022; v1 submitted 5 September, 2021; originally announced September 2021.

arXiv:2108.09805 [pdf, other]

Efficient Algorithms for Learning from Coarse Labels

Authors: Dimitris Fotakis, Alkis Kalavasis, Vasilis Kontonis, Christos Tzamos

Abstract: For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set $\mathcal{Z}$, we observe coarse labels corresponding to a p… ▽ More For many learning problems one may not have access to fine grained label information; e.g., an image can be labeled as husky, dog, or even animal depending on the expertise of the annotator. In this work, we formalize these settings and study the problem of learning from such coarse data. Instead of observing the actual labels from a set $\mathcal{Z}$, we observe coarse labels corresponding to a partition of $\mathcal{Z}$ (or a mixture of partitions). Our main algorithmic result is that essentially any problem learnable from fine grained labels can also be learned efficiently when the coarse data are sufficiently informative. We obtain our result through a generic reduction for answering Statistical Queries (SQ) over fine grained labels given only coarse labels. The number of coarse labels required depends polynomially on the information distortion due to coarsening and the number of fine labels $|\mathcal{Z}|$. We also investigate the case of (infinitely many) real valued labels focusing on a central problem in censored and truncated statistics: Gaussian mean estimation from coarse data. We provide an efficient algorithm when the sets in the partition are convex and establish that the problem is NP-hard even for very simple non-convex sets. △ Less

Submitted 24 March, 2023; v1 submitted 22 August, 2021; originally announced August 2021.

arXiv:2107.13344 [pdf, other]

On the Approximability of Multistage Min-Sum Set Cover

Authors: Dimitris Fotakis, Panagiotis Kostopanagiotis, Vasileios Nakos, Georgios Piliouras, Stratis Skoulakis

Abstract: We investigate the polynomial-time approximability of the multistage version of Min-Sum Set Cover ($\mathrm{DSSC}$), a natural and intriguing generalization of the classical List Update problem. In $\mathrm{DSSC}$, we maintain a sequence of permutations $(π^0, π^1, \ldots, π^T)$ on $n$ elements, based on a sequence of requests $(R^1, \ldots, R^T)$. We aim to minimize the total cost of updating… ▽ More We investigate the polynomial-time approximability of the multistage version of Min-Sum Set Cover ($\mathrm{DSSC}$), a natural and intriguing generalization of the classical List Update problem. In $\mathrm{DSSC}$, we maintain a sequence of permutations $(π^0, π^1, \ldots, π^T)$ on $n$ elements, based on a sequence of requests $(R^1, \ldots, R^T)$. We aim to minimize the total cost of updating $π^{t-1}$ to $π^{t}$, quantified by the Kendall tau distance $\mathrm{D}_{\mathrm{KT}}(π^{t-1}, π^t)$, plus the total cost of covering each request $R^t$ with the current permutation $π^t$, quantified by the position of the first element of $R^t$ in $π^t$. Using a reduction from Set Cover, we show that $\mathrm{DSSC}$ does not admit an $O(1)$-approximation, unless $\mathrm{P} = \mathrm{NP}$, and that any $o(\log n)$ (resp. $o(r)$) approximation to $\mathrm{DSSC}$ implies a sublogarithmic (resp. $o(r)$) approximation to Set Cover (resp. where each element appears at most $r$ times). Our main technical contribution is to show that $\mathrm{DSSC}$ can be approximated in polynomial-time within a factor of $O(\log^2 n)$ in general instances, by randomized rounding, and within a factor of $O(r^2)$, if all requests have cardinality at most $r$, by deterministic rounding. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2107.11977 [pdf, other]

Strategyproof Facility Location in Perturbation Stable Instances

Authors: Dimitris Fotakis, Panagiotis Patsilinakos

Abstract: We consider $k$-Facility Location games, where $n$ strategic agents report their locations on the real line, and a mechanism maps them to $k\ge 2$ facilities. Each agent seeks to minimize her distance to the nearest facility. We are interested in (deterministic or randomized) strategyproof mechanisms without payments that achieve a reasonable approximation ratio to the optimal social cost of the a… ▽ More We consider $k$-Facility Location games, where $n$ strategic agents report their locations on the real line, and a mechanism maps them to $k\ge 2$ facilities. Each agent seeks to minimize her distance to the nearest facility. We are interested in (deterministic or randomized) strategyproof mechanisms without payments that achieve a reasonable approximation ratio to the optimal social cost of the agents. To circumvent the inapproximability of $k$-Facility Location by deterministic strategyproof mechanisms, we restrict our attention to perturbation stable instances. An instance of $k$-Facility Location on the line is $γ$-perturbation stable (or simply, $γ$-stable), for some $γ\ge 1$, if the optimal agent clustering is not affected by moving any subset of consecutive agent locations closer to each other by a factor at most $γ$. We show that the optimal solution is strategyproof in $(2+\sqrt{3})$-stable instances whose optimal solution does not include any singleton clusters, and that allocating the facility to the agent next to the rightmost one in each optimal cluster (or to the unique agent, for singleton clusters) is strategyproof and $(n-2)/2$-approximate for $5$-stable instances (even if their optimal solution includes singleton clusters). On the negative side, we show that for any $k\ge 3$ and any $δ> 0$, there is no deterministic anonymous mechanism that achieves a bounded approximation ratio and is strategyproof in $(\sqrt{2}-δ)$-stable instances. We also prove that allocating the facility to a random agent of each optimal cluster is strategyproof and $2$-approximate in $5$-stable instances. To the best of our knowledge, this is the first time that the existence of deterministic (resp. randomized) strategyproof mechanisms with a bounded (resp. constant) approximation ratio is shown for a large and natural class of $k$-Facility Location instances. △ Less

Submitted 3 March, 2024; v1 submitted 26 July, 2021; originally announced July 2021.

arXiv:2107.08277 [pdf, other]

Improved Bounds for Online Facility Location with Predictions

Authors: Dimitris Fotakis, Evangelia Gergatsouli, Themis Gouleakis, Nikolas Patris, Thanos Tolias

Abstract: We consider Online Facility Location in the framework of learning-augmented online algorithms. In Online Facility Location (OFL), demands arrive one-by-one in a metric space and must be (irrevocably) assigned to an open facility upon arrival, without any knowledge about future demands. We focus on uniform facility opening costs and present an online algorithm for OFL that exploits potentially impe… ▽ More We consider Online Facility Location in the framework of learning-augmented online algorithms. In Online Facility Location (OFL), demands arrive one-by-one in a metric space and must be (irrevocably) assigned to an open facility upon arrival, without any knowledge about future demands. We focus on uniform facility opening costs and present an online algorithm for OFL that exploits potentially imperfect predictions on the locations of the optimal facilities. We prove that the competitive ratio decreases from sublogarithmic in the number of demands $n$ to constant as the so-called $η_1$ error, i.e., the sum of distances of the predicted locations to the optimal facility locations, decreases. E.g., our analysis implies that if for some $\varepsilon > 0$, $η_1 = \mathrm{OPT} / n^\varepsilon$, where $\mathrm{OPT}$ is the cost of the optimal solution, the competitive ratio becomes $O(1/\varepsilon)$. We complement our analysis with a matching lower bound establishing that the dependence of the algorithm's competitive ratio on the $η_1$ error is optimal, up to constant factors. Finally, we evaluate our algorithm on real world data and compare the performance of our learning-augmented approach against the performance of the best known algorithm for OFL without predictions. △ Less

Submitted 18 August, 2024; v1 submitted 17 July, 2021; originally announced July 2021.

arXiv:2107.03449 [pdf, ps, other]

Stochastic Opinion Dynamics for Interest Prediction in Social Networks

Authors: Marios Papachristou, Dimitris Fotakis

Abstract: We exploit the core-periphery structure and the strong homophilic properties of online social networks to develop faster and more accurate algorithms for user interest prediction. The core of modern social networks consists of relatively few influential users, whose interest profiles are publicly available, while the majority of peripheral users follow enough of them based on common interests. Our… ▽ More We exploit the core-periphery structure and the strong homophilic properties of online social networks to develop faster and more accurate algorithms for user interest prediction. The core of modern social networks consists of relatively few influential users, whose interest profiles are publicly available, while the majority of peripheral users follow enough of them based on common interests. Our approach is to predict the interests of the peripheral nodes starting from the interests of their influential connections. To this end, we need a formal model that explains how common interests lead to network connections. Thus, we propose a stochastic interest formation model, the Nearest Neighbor Influence Model (NNIM), which is inspired by the Hegselmann-Krause opinion formation model and aims to explain how homophily shapes the network. Based on NNIM, we develop an efficient approach for predicting the interests of the peripheral users. At the technical level, we use Variational Expectation-Maximization to optimize the instantaneous likelihood function using a mean-field approximation of NNIM. We prove that our algorithm converges fast and is capable of scaling smoothly to networks with millions of nodes. Our experiments on standard network benchmarks demonstrate that our algorithm runs up to two orders of magnitude faster than the best known node embedding methods and achieves similar accuracy. △ Less

Submitted 7 July, 2021; originally announced July 2021.

arXiv:2107.02489 [pdf, other]

Metric-Distortion Bounds under Limited Information

Authors: Ioannis Anagnostides, Dimitris Fotakis, Panagiotis Patsilinakos

Abstract: In this work we study the metric distortion problem in voting theory under a limited amount of ordinal information. Our primary contribution is threefold. First, we consider mechanisms which perform a sequence of pairwise comparisons between candidates. We show that a widely-popular deterministic mechanism employed in most knockout phases yields distortion $\mathcal{O}(\log m)$ while eliciting onl… ▽ More In this work we study the metric distortion problem in voting theory under a limited amount of ordinal information. Our primary contribution is threefold. First, we consider mechanisms which perform a sequence of pairwise comparisons between candidates. We show that a widely-popular deterministic mechanism employed in most knockout phases yields distortion $\mathcal{O}(\log m)$ while eliciting only $m-1$ out of $Θ(m^2)$ possible pairwise comparisons, where $m$ represents the number of candidates. Our analysis for this mechanism leverages a powerful technical lemma recently developed by Kempe \cite{DBLP:conf/aaai/000120a}. We also provide a matching lower bound on its distortion. In contrast, we prove that any mechanism which performs fewer than $m-1$ pairwise comparisons is destined to have unbounded distortion. Moreover, we study the power of deterministic mechanisms under incomplete rankings. Most notably, when every agent provides her $k$-top preferences we show an upper bound of $6 m/k + 1$ on the distortion, for any $k \in \{1, 2, \dots, m\}$. Thus, we substantially improve over the previous bound of $12 m/k$ recently established by Kempe \cite{DBLP:conf/aaai/000120a,DBLP:conf/aaai/000120b}, and we come closer to matching the best-known lower bound. Finally, we are concerned with the sample complexity required to ensure near-optimal distortion with high probability. Our main contribution is to show that a random sample of $Θ(m/ε^2)$ voters suffices to guarantee distortion $3 + ε$ with high probability, for any sufficiently small $ε> 0$. This result is based on analyzing the sensitivity of the deterministic mechanism introduced by Gkatzelis, Halpern, and Shah \cite{DBLP:conf/focs/Gkatzelis0020}. Importantly, all of our sample-complexity bounds are distribution-independent. △ Less

Submitted 6 July, 2021; originally announced July 2021.

arXiv:2106.04336 [pdf, other]

Efficient Online Learning for Dynamic k-Clustering

Authors: Dimitris Fotakis, Georgios Piliouras, Stratis Skoulakis

Abstract: We study dynamic clustering problems from the perspective of online learning. We consider an online learning problem, called \textit{Dynamic $k$-Clustering}, in which $k$ centers are maintained in a metric space over time (centers may change positions) such as a dynamically changing set of $r$ clients is served in the best possible way. The connection cost at round $t$ is given by the \textit{$p$-… ▽ More We study dynamic clustering problems from the perspective of online learning. We consider an online learning problem, called \textit{Dynamic $k$-Clustering}, in which $k$ centers are maintained in a metric space over time (centers may change positions) such as a dynamically changing set of $r$ clients is served in the best possible way. The connection cost at round $t$ is given by the \textit{$p$-norm} of the vector consisting of the distance of each client to its closest center at round $t$, for some $p\geq 1$ or $p = \infty$. We present a \textit{$Θ\left( \min(k,r) \right)$-regret} polynomial-time online learning algorithm and show that, under some well-established computational complexity conjectures, \textit{constant-regret} cannot be achieved in polynomial-time. In addition to the efficient solution of Dynamic $k$-Clustering, our work contributes to the long line of research on combinatorial online learning. △ Less

Submitted 8 June, 2021; originally announced June 2021.

arXiv:2012.06331 [pdf, other]

Solving Inverse Problems for Spectral Energy Distributions with Deep Generative Networks

Authors: Agapi Rissaki, Orestis Pavlou, Dimitris Fotakis, Vicky Papadopoulou, Andreas Efstathiou

Abstract: We propose an end-to-end approach for solving inverse problems for a class of complex astronomical signals, namely Spectral Energy Distributions (SEDs). Our goal is to reconstruct such signals from scarce and/or unreliable measurements. We achieve that by leveraging a learned structural prior in the form of a Deep Generative Network. Similar methods have been tested almost exclusively for images w… ▽ More We propose an end-to-end approach for solving inverse problems for a class of complex astronomical signals, namely Spectral Energy Distributions (SEDs). Our goal is to reconstruct such signals from scarce and/or unreliable measurements. We achieve that by leveraging a learned structural prior in the form of a Deep Generative Network. Similar methods have been tested almost exclusively for images which display useful properties (e.g., locality, periodicity) that are implicitly exploited. However, SEDs lack such properties which make the problem more challenging. We manage to successfully extend the methods to SEDs using a Generative Latent Optimization model trained with significantly fewer and corrupted data. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Comments: Accepted to NeurIPS 2020 Workshop on Machine Learning and the Physical Sciences

ACM Class: J.2; I.5.4; I.2.6

arXiv:2011.02817 [pdf, other]

Efficient Online Learning of Optimal Rankings: Dimensionality Reduction via Gradient Descent

Authors: Dimitris Fotakis, Thanasis Lianeas, Georgios Piliouras, Stratis Skoulakis

Abstract: We consider a natural model of online preference aggregation, where sets of preferred items $R_1, R_2, \ldots, R_t$ along with a demand for $k_t$ items in each $R_t$, appear online. Without prior knowledge of $(R_t, k_t)$, the learner maintains a ranking $π_t$ aiming that at least $k_t$ items from $R_t$ appear high in $π_t$. This is a fundamental problem in preference aggregation with applications… ▽ More We consider a natural model of online preference aggregation, where sets of preferred items $R_1, R_2, \ldots, R_t$ along with a demand for $k_t$ items in each $R_t$, appear online. Without prior knowledge of $(R_t, k_t)$, the learner maintains a ranking $π_t$ aiming that at least $k_t$ items from $R_t$ appear high in $π_t$. This is a fundamental problem in preference aggregation with applications to, e.g., ordering product or news items in web pages based on user scrolling and click patterns. The widely studied Generalized Min-Sum-Set-Cover (GMSSC) problem serves as a formal model for the setting above. GMSSC is NP-hard and the standard application of no-regret online learning algorithms is computationally inefficient, because they operate in the space of rankings. In this work, we show how to achieve low regret for GMSSC in polynomial-time. We employ dimensionality reduction from rankings to the space of doubly stochastic matrices, where we apply Online Gradient Descent. A key step is to show how subgradients can be computed efficiently, by solving the dual of a configuration LP. Using oblivious deterministic and randomized rounding schemes, we map doubly stochastic matrices back to rankings with a small loss in the GMSSC objective. △ Less

Submitted 5 November, 2020; originally announced November 2020.

arXiv:2011.00810 [pdf, other]

Aggregating Incomplete and Noisy Rankings

Authors: Dimitris Fotakis, Alkis Kalavasis, Konstantinos Stavropoulos

Abstract: We consider the problem of learning the true ordering of a set of alternatives from largely incomplete and noisy rankings. We introduce a natural generalization of both the classical Mallows model of ranking distributions and the extensively studied model of noisy pairwise comparisons. Our selective Mallows model outputs a noisy ranking on any given subset of alternatives, based on an underlying M… ▽ More We consider the problem of learning the true ordering of a set of alternatives from largely incomplete and noisy rankings. We introduce a natural generalization of both the classical Mallows model of ranking distributions and the extensively studied model of noisy pairwise comparisons. Our selective Mallows model outputs a noisy ranking on any given subset of alternatives, based on an underlying Mallows distribution. Assuming a sequence of subsets where each pair of alternatives appears frequently enough, we obtain strong asymptotically tight upper and lower bounds on the sample complexity of learning the underlying complete ranking and the (identities and the) ranking of the top-k alternatives from selective Mallows rankings. Moreover, building on the work of (Braverman and Mossel, 2009), we show how to efficiently compute the maximum likelihood complete ranking from selective Mallows rankings. △ Less

Submitted 27 June, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: 21 pages, 3 figures. Minor changes and experimental results added in this version. Corresponding to the camera-ready version that appeared in the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)

Journal ref: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:2278-2286, 2021

arXiv:2007.08669 [pdf, other]

Memoryless Algorithms for the Generalized $k$-server Problem on Uniform Metrics

Authors: Dimitris Christou, Dimitris Fotakis, Grigorios Koumoutsos

Abstract: We consider the generalized $k$-server problem on uniform metrics. We study the power of memoryless algorithms and show tight bounds of $Θ(k!)$ on their competitive ratio. In particular we show that the \textit{Harmonic Algorithm} achieves this competitive ratio and provide matching lower bounds. This improves the $\approx 2^{2^k}$ doubly-exponential bound of Chiplunkar and Vishwanathan for the mo… ▽ More We consider the generalized $k$-server problem on uniform metrics. We study the power of memoryless algorithms and show tight bounds of $Θ(k!)$ on their competitive ratio. In particular we show that the \textit{Harmonic Algorithm} achieves this competitive ratio and provide matching lower bounds. This improves the $\approx 2^{2^k}$ doubly-exponential bound of Chiplunkar and Vishwanathan for the more general setting of uniform metrics with different weights. △ Less

Submitted 16 July, 2020; originally announced July 2020.

arXiv:2007.02392 [pdf, ps, other]

Efficient Parameter Estimation of Truncated Boolean Product Distributions

Authors: Dimitris Fotakis, Alkis Kalavasis, Christos Tzamos

Abstract: We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S \subset \{0, 1\}^d$ accessible through a membership oracle. This is the first time that the computational and statistical complexity of learning from truncated samples is considered in a discrete setting. We introduce a natural notion of fatness of the… ▽ More We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S \subset \{0, 1\}^d$ accessible through a membership oracle. This is the first time that the computational and statistical complexity of learning from truncated samples is considered in a discrete setting. We introduce a natural notion of fatness of the truncation set $S$, under which truncated samples reveal enough information about the true distribution. We show that if the truncation set is sufficiently fat, samples from the true distribution can be generated from truncated samples. A stunning consequence is that virtually any statistical task (e.g., learning in total variation distance, parameter estimation, uniformity or identity testing) that can be performed efficiently for Boolean product distributions, can also be performed from truncated samples, with a small increase in sample complexity. We generalize our approach to ranking distributions over $d$ alternatives, where we show how fatness implies efficient parameter estimation of Mallows models from truncated samples. Exploring the limits of learning discrete models from truncated samples, we identify three natural conditions that are necessary for efficient identifiability: (i) the truncation set $S$ should be rich enough; (ii) $S$ should be accessible through membership queries; and (iii) the truncation by $S$ should leave enough randomness in all directions. By carefully adapting the Stochastic Gradient Descent approach of (Daskalakis et al., FOCS 2018), we show that these conditions are also sufficient for efficient learning of truncated Boolean product distributions. △ Less

Submitted 24 April, 2022; v1 submitted 5 July, 2020; originally announced July 2020.

Comments: 33rd Conference on Learning Theory (COLT 2020)

arXiv:2006.09889 [pdf, ps, other]

doi 10.1007/s00224-022-10078-9

Mechanism Design for Perturbation Stable Combinatorial Auctions

Authors: Giannis Fikioris, Dimitris Fotakis

Abstract: Motivated by recent research on combinatorial markets with endowed valuations by (Babaioff et al., EC 2018) and (Ezra et al., EC 2020), we introduce a notion of perturbation stability in Combinatorial Auctions (CAs) and study the extend to which stability helps in social welfare maximization and mechanism design. A CA is $γ\textit{-stable}$ if the optimal solution is resilient to inflation, by a f… ▽ More Motivated by recent research on combinatorial markets with endowed valuations by (Babaioff et al., EC 2018) and (Ezra et al., EC 2020), we introduce a notion of perturbation stability in Combinatorial Auctions (CAs) and study the extend to which stability helps in social welfare maximization and mechanism design. A CA is $γ\textit{-stable}$ if the optimal solution is resilient to inflation, by a factor of $γ\geq 1$, of any bidder's valuation for any single item. On the positive side, we show how to compute efficiently an optimal allocation for 2-stable subadditive valuations and that a Walrasian equilibrium exists for 2-stable submodular valuations. Moreover, we show that a Parallel 2nd Price Auction (P2A) followed by a demand query for each bidder is truthful for general subadditive valuations and results in the optimal allocation for 2-stable submodular valuations. To highlight the challenges behind optimization and mechanism design for stable CAs, we show that a Walrasian equilibrium may not exist for $γ$-stable XOS valuations for any $γ$, that a polynomial-time approximation scheme does not exist for $(2-ε)$-stable submodular valuations, and that any DSIC mechanism that computes the optimal allocation for stable CAs and does not use demand queries must use exponentially many value queries. We conclude with analyzing the Price of Anarchy of P2A and Parallel 1st Price Auctions (P1A) for CAs with stable submodular and XOS valuations. Our results indicate that the quality of equilibria of simple non-truthful auctions improves only for $γ$-stable instances with $γ\geq 3$. △ Less

Submitted 15 July, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:2003.02161 [pdf, other]

The Online Min-Sum Set Cover Problem

Authors: Dimitris Fotakis, Loukas Kavouras, Grigorios Koumoutsos, Stratis Skoulakis, Manolis Vardas

Abstract: We consider the online Min-Sum Set Cover (MSSC), a natural and intriguing generalization of the classical list update problem. In Online MSSC, the algorithm maintains a permutation on $n$ elements based on subsets $S_1, S_2, \ldots$ arriving online. The algorithm serves each set $S_t$ upon arrival, using its current permutation $π_{t}$, incurring an access cost equal to the position of the first e… ▽ More We consider the online Min-Sum Set Cover (MSSC), a natural and intriguing generalization of the classical list update problem. In Online MSSC, the algorithm maintains a permutation on $n$ elements based on subsets $S_1, S_2, \ldots$ arriving online. The algorithm serves each set $S_t$ upon arrival, using its current permutation $π_{t}$, incurring an access cost equal to the position of the first element of $S_t$ in $π_{t}$. Then, the algorithm may update its permutation to $π_{t+1}$, incurring a moving cost equal to the Kendall tau distance of $π_{t}$ to $π_{t+1}$. The objective is to minimize the total access and moving cost for serving the entire sequence. We consider the $r$-uniform version, where each $S_t$ has cardinality $r$. List update is the special case where $r = 1$. We obtain tight bounds on the competitive ratio of deterministic online algorithms for MSSC against a static adversary, that serves the entire sequence by a single permutation. First, we show a lower bound of $(r+1)(1-\frac{r}{n+1})$ on the competitive ratio. Then, we consider several natural generalizations of successful list update algorithms and show that they fail to achieve any interesting competitive guarantee. On the positive side, we obtain a $O(r)$-competitive deterministic algorithm using ideas from online learning and the multiplicative weight updates (MWU) algorithm. Furthermore, we consider efficient algorithms. We propose a memoryless online algorithm, called Move-All-Equally, which is inspired by the Double Coverage algorithm for the $k$-server problem. We show that its competitive ratio is $Ω(r^2)$ and $2^{O(\sqrt{\log n \cdot \log r})}$, and conjecture that it is $f(r)$-competitive. We also compare Move-All-Equally against the dynamic optimal solution and obtain (almost) tight bounds by showing that it is $Ω(r \sqrt{n})$ and $O(r^{3/2} \sqrt{n})$-competitive. △ Less

Submitted 29 June, 2022; v1 submitted 4 March, 2020; originally announced March 2020.

Comments: A preliminary version of this article appeared in the Proceedings of the 47th International Colloquium on Automata, Languages and Programming (ICALP 2020)

arXiv:2002.01251 [pdf, ps, other]

Local Aggregation in Preference Games

Authors: Angelo Fanelli, Dimitris Fotakis

Abstract: In this work we introduce a new model of decision-making by agents in a social network. Agents have innate preferences over the strategies but, because of the social interactions, the decision of the agents are not only affected by their innate preferences but also by the decision taken by their social neighbors. We assume that the strategies of the agents are embedded in an {approximate} metric s… ▽ More In this work we introduce a new model of decision-making by agents in a social network. Agents have innate preferences over the strategies but, because of the social interactions, the decision of the agents are not only affected by their innate preferences but also by the decision taken by their social neighbors. We assume that the strategies of the agents are embedded in an {approximate} metric space. Furthermore, departing from the previous literature, we assume that, due to the lack of information, each agent locally represents the trend of the network through an aggregate value, which can be interpreted as the output of an aggregation function. We answer some fundamental questions related to the existence and efficiency of pure Nash equilibria. △ Less

Submitted 4 February, 2020; originally announced February 2020.

arXiv:1911.08704 [pdf, other]

Node Max-Cut and Computing Equilibria in Linear Weighted Congestion Games

Authors: Dimitris Fotakis, Vardis Kandiros, Thanasis Lianeas, Nikos Mouzakis, Panagiotis Patsilinakos, Stratis Skoulakis

Abstract: In this work, we seek a more refined understanding of the complexity of local optimum computation for Max-Cut and pure Nash equilibrium (PNE) computation for congestion games with weighted players and linear latency functions. We show that computing a PNE of linear weighted congestion games is PLS-complete either for very restricted strategy spaces, namely when player strategies are paths on a ser… ▽ More In this work, we seek a more refined understanding of the complexity of local optimum computation for Max-Cut and pure Nash equilibrium (PNE) computation for congestion games with weighted players and linear latency functions. We show that computing a PNE of linear weighted congestion games is PLS-complete either for very restricted strategy spaces, namely when player strategies are paths on a series-parallel network with a single origin and destination, or for very restricted latency functions, namely when the latency on each resource is equal to the congestion. Our results reveal a remarkable gap regarding the complexity of PNE in congestion games with weighted and unweighted players, since in case of unweighted players, a PNE can be easily computed by either a simple greedy algorithm (for series-parallel networks) or any better response dynamics (when the latency is equal to the congestion). For the latter of the results above, we need to show first that computing a local optimum of a natural restriction of Max-Cut, which we call \emph{Node-Max-Cut}, is PLS-complete. In Node-Max-Cut, the input graph is vertex-weighted and the weight of each edge is equal to the product of the weights of its endpoints. Due to the very restricted nature of Node-Max-Cut, the reduction requires a careful combination of new gadgets with ideas and techniques from previous work. We also show how to compute efficiently a $(1+\eps)$-approximate equilibrium for Node-Max-Cut, if the number of different vertex weights is constant. △ Less

Submitted 23 February, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

arXiv:1906.01009 [pdf, ps, other]

Optimal Learning of Mallows Block Model

Authors: Róbert Busa-Fekete, Dimitris Fotakis, Balázs Szörényi, Manolis Zampetakis

Abstract: The Mallows model, introduced in the seminal paper of Mallows 1957, is one of the most fundamental ranking distribution over the symmetric group $S_m$. To analyze more complex ranking data, several studies considered the Generalized Mallows model defined by Fligner and Verducci 1986. Despite the significant research interest of ranking distributions, the exact sample complexity of estimating the p… ▽ More The Mallows model, introduced in the seminal paper of Mallows 1957, is one of the most fundamental ranking distribution over the symmetric group $S_m$. To analyze more complex ranking data, several studies considered the Generalized Mallows model defined by Fligner and Verducci 1986. Despite the significant research interest of ranking distributions, the exact sample complexity of estimating the parameters of a Mallows and a Generalized Mallows Model is not well-understood. The main result of the paper is a tight sample complexity bound for learning Mallows and Generalized Mallows Model. We approach the learning problem by analyzing a more general model which interpolates between the single parameter Mallows Model and the $m$ parameter Mallows model. We call our model Mallows Block Model -- referring to the Block Models that are a popular model in theoretical statistics. Our sample complexity analysis gives tight bound for learning the Mallows Block Model for any number of blocks. We provide essentially matching lower bounds for our sample complexity results. As a corollary of our analysis, it turns out that, if the central ranking is known, one single sample from the Mallows Block Model is sufficient to estimate the spread parameters with error that goes to zero as the size of the permutations goes to infinity. In addition, we calculate the exact rate of the parameter estimation error. △ Less

Submitted 3 June, 2019; originally announced June 2019.

arXiv:1905.12379 [pdf, ps, other]

Reallocating Multiple Facilities on the Line

Authors: Dimitris Fotakis, Loukas Kavouras, Panagiotis Kostopanagiotis, Philip Lazos, Stratis Skoulakis, Nikolas Zarifis

Abstract: We study the multistage $K$-facility reallocation problem on the real line, where we maintain $K$ facility locations over $T$ stages, based on the stage-dependent locations of $n$ agents. Each agent is connected to the nearest facility at each stage, and the facilities may move from one stage to another, to accommodate different agent locations. The objective is to minimize the connection cost of… ▽ More We study the multistage $K$-facility reallocation problem on the real line, where we maintain $K$ facility locations over $T$ stages, based on the stage-dependent locations of $n$ agents. Each agent is connected to the nearest facility at each stage, and the facilities may move from one stage to another, to accommodate different agent locations. The objective is to minimize the connection cost of the agents plus the total moving cost of the facilities, over all stages. $K$-facility reallocation was introduced by de Keijzer and Wojtczak, where they mostly focused on the special case of a single facility. Using an LP-based approach, we present a polynomial time algorithm that computes the optimal solution for any number of facilities. We also consider online $K$-facility reallocation, where the algorithm becomes aware of agent locations in a stage-by-stage fashion. By exploiting an interesting connection to the classical $K$-server problem, we present a constant-competitive algorithm for $K = 2$ facilities. △ Less

Submitted 29 May, 2019; originally announced May 2019.

arXiv:1903.11016 [pdf, ps, other]

Malleable scheduling beyond identical machines

Authors: Dimitris Fotakis, Jannik Matuschke, Orestis Papadigenopoulos

Abstract: In malleable job scheduling, jobs can be executed simultaneously on multiple machines with the processing time depending on the number of allocated machines. In this setting, jobs are required to be executed non-preemptively and in unison, in the sense that they occupy, during their execution, the same time interval over all the machines of the allocated set. In this work, we study generalizations… ▽ More In malleable job scheduling, jobs can be executed simultaneously on multiple machines with the processing time depending on the number of allocated machines. In this setting, jobs are required to be executed non-preemptively and in unison, in the sense that they occupy, during their execution, the same time interval over all the machines of the allocated set. In this work, we study generalizations of malleable job scheduling inspired by standard scheduling on unrelated machines. Specifically, we introduce a general model of malleable job scheduling, where each machine has a (possibly different) speed for each job, and the processing time of a job $j$ on a set of allocated machines $S$ depends on the total speed of $S$ with respect to $j$. For machines with unrelated speeds, we show that the optimal makespan cannot be approximated within a factor less than $\frac{e}{e-1}$, unless $P = NP$. On the positive side, we present polynomial-time algorithms with approximation ratios $\frac{2e}{e-1}$ for machines with unrelated speeds, $3$ for machines with uniform speeds, and $7/3$ for restricted assignments on identical machines. Our algorithms are based on deterministic LP rounding. They result in sparse schedules, in the sense that each machine shares at most one job with other machines. We also prove lower bounds on the integrality gap of $1+\varphi$ for unrelated speeds ($\varphi$ is the golden ratio) and $2$ for uniform speeds and restricted assignments. To indicate the generality of our approach, we show that it also yields constant factor approximation algorithms for a variant where we determine the effective speed of a set of allocated machines based on the $L_p$ norm of their speeds. △ Less

Submitted 7 April, 2020; v1 submitted 26 March, 2019; originally announced March 2019.

arXiv:1809.01803 [pdf, ps, other]

A Bridge between Liquid and Social Welfare in Combinatorial Auctions with Submodular Bidders

Authors: Dimitris Fotakis, Kyriakos Lotidis, Chara Podimata

Abstract: We study incentive compatible mechanisms for Combinatorial Auctions where the bidders have submodular (or XOS) valuations and are budget-constrained. Our objective is to maximize the \emph{liquid welfare}, a notion of efficiency for budget-constrained bidders introduced by Dobzinski and Paes Leme (2014). We show that some of the known truthful mechanisms that best-approximate the social welfare fo… ▽ More We study incentive compatible mechanisms for Combinatorial Auctions where the bidders have submodular (or XOS) valuations and are budget-constrained. Our objective is to maximize the \emph{liquid welfare}, a notion of efficiency for budget-constrained bidders introduced by Dobzinski and Paes Leme (2014). We show that some of the known truthful mechanisms that best-approximate the social welfare for Combinatorial Auctions with submodular bidders through demand query oracles can be adapted, so that they retain truthfulness and achieve asymptotically the same approximation guarantees for the liquid welfare. More specifically, for the problem of optimizing the liquid welfare in Combinatorial Auctions with submodular bidders, we obtain a universally truthful randomized $O(\log m)$-approximate mechanism, where $m$ is the number of items, by adapting the mechanism of Krysta and Vöcking (2012). Additionally, motivated by large market assumptions often used in mechanism design, we introduce a notion of competitive markets and show that in such markets, liquid welfare can be approximated within a constant factor by a randomized universally truthful mechanism. Finally, in the Bayesian setting, we obtain a truthful $O(1)$-approximate mechanism for the case where bidder valuations are generated as independent samples from a known distribution, by adapting the results of Feldman, Gravin and Lucier (2014). △ Less

Submitted 12 December, 2018; v1 submitted 5 September, 2018; originally announced September 2018.

Comments: AAAI-19

arXiv:1707.05662 [pdf, ps, other]

Learning Powers of Poisson Binomial Distributions

Authors: Dimitris Fotakis, Vasilis Kontonis, Piotr Krysta, Paul Spirakis

Abstract: We introduce the problem of simultaneously learning all powers of a Poisson Binomial Distribution (PBD). A PBD of order $n$ is the distribution of a sum of $n$ mutually independent Bernoulli random variables $X_i$, where $\mathbb{E}[X_i] = p_i$. The $k$'th power of this distribution, for $k$ in a range $[m]$, is the distribution of $P_k = \sum_{i=1}^n X_i^{(k)}$, where each Bernoulli random variab… ▽ More We introduce the problem of simultaneously learning all powers of a Poisson Binomial Distribution (PBD). A PBD of order $n$ is the distribution of a sum of $n$ mutually independent Bernoulli random variables $X_i$, where $\mathbb{E}[X_i] = p_i$. The $k$'th power of this distribution, for $k$ in a range $[m]$, is the distribution of $P_k = \sum_{i=1}^n X_i^{(k)}$, where each Bernoulli random variable $X_i^{(k)}$ has $\mathbb{E}[X_i^{(k)}] = (p_i)^k$. The learning algorithm can query any power $P_k$ several times and succeeds in learning all powers in the range, if with probability at least $1- δ$: given any $k \in [m]$, it returns a probability distribution $Q_k$ with total variation distance from $P_k$ at most $ε$. We provide almost matching lower and upper bounds on query complexity for this problem. We first show a lower bound on the query complexity on PBD powers instances with many distinct parameters $p_i$ which are separated, and we almost match this lower bound by examining the query complexity of simultaneously learning all the powers of a special class of PBD's resembling the PBD's of our lower bound. We study the fundamental setting of a Binomial distribution, and provide an optimal algorithm which uses $O(1/ε^2)$ samples. Diakonikolas, Kane and Stewart [COLT'16] showed a lower bound of $Ω(2^{1/ε})$ samples to learn the $p_i$'s within error $ε$. The question whether sampling from powers of PBDs can reduce this sampling complexity, has a negative answer since we show that the exponential number of samples is inevitable. Having sampling access to the powers of a PBD we then give a nearly optimal algorithm that learns its $p_i$'s. To prove our two last lower bounds we extend the classical minimax risk definition from statistics to estimating functions of sequences of distributions. △ Less

Submitted 18 July, 2017; originally announced July 2017.

arXiv:1602.06411 [pdf, other]

On the Size and the Approximability of Minimum Temporally Connected Subgraphs

Authors: Kyriakos Axiotis, Dimitris Fotakis

Abstract: We consider temporal graphs with discrete time labels and investigate the size and the approximability of minimum temporally connected spanning subgraphs. We present a family of minimally connected temporal graphs with $n$ vertices and $Ω(n^2)$ edges, thus resolving an open question of (Kempe, Kleinberg, Kumar, JCSS 64, 2002) about the existence of sparse temporal connectivity certificates. Next,… ▽ More We consider temporal graphs with discrete time labels and investigate the size and the approximability of minimum temporally connected spanning subgraphs. We present a family of minimally connected temporal graphs with $n$ vertices and $Ω(n^2)$ edges, thus resolving an open question of (Kempe, Kleinberg, Kumar, JCSS 64, 2002) about the existence of sparse temporal connectivity certificates. Next, we consider the problem of computing a minimum weight subset of temporal edges that preserve connectivity of a given temporal graph either from a given vertex r (r-MTC problem) or among all vertex pairs (MTC problem). We show that the approximability of r-MTC is closely related to the approximability of Directed Steiner Tree and that r-MTC can be solved in polynomial time if the underlying graph has bounded treewidth. We also show that the best approximation ratio for MTC is at least $O(2^{\log^{1-ε} n})$ and at most $O(\min\{n^{1+ε}, (ΔM)^{2/3+ε}\})$, for any constant $ε> 0$, where $M$ is the number of temporal edges and $Δ$ is the maximum degree of the underlying graph. Furthermore, we prove that the unweighted version of MTC is APX-hard and that MTC is efficiently solvable in trees and $2$-approximable in cycles. △ Less

Submitted 20 February, 2016; originally announced February 2016.

arXiv:1602.05263 [pdf, other]

Scheduling MapReduce Jobs under Multi-Round Precedences

Authors: Dimitris Fotakis, Ioannis Milis, Orestis Papadigenopoulos, Vasilis Vassalos, Georgios Zois

Abstract: We consider non-preemptive scheduling of MapReduce jobs with multiple tasks in the practical scenario where each job requires several map-reduce rounds. We seek to minimize the average weighted completion time and consider scheduling on identical and unrelated parallel processors. For identical processors, we present LP-based O(1)-approximation algorithms. For unrelated processors, the approximati… ▽ More We consider non-preemptive scheduling of MapReduce jobs with multiple tasks in the practical scenario where each job requires several map-reduce rounds. We seek to minimize the average weighted completion time and consider scheduling on identical and unrelated parallel processors. For identical processors, we present LP-based O(1)-approximation algorithms. For unrelated processors, the approximation ratio naturally depends on the maximum number of rounds of any job. Since the number of rounds per job in typical MapReduce algorithms is a small constant, our scheduling algorithms achieve a small approximation ratio in practice. For the single-round case, we substantially improve on previously best known approximation guarantees for both identical and unrelated processors. Moreover, we conduct an experimental analysis and compare the performance of our algorithms against a fast heuristic and a lower bound on the optimal solution, thus demonstrating their promising practical performance. △ Less

Submitted 16 February, 2016; originally announced February 2016.

arXiv:1507.04391 [pdf, ps, other]

Sub-exponential Approximation Schemes for CSPs: from Dense to Almost Sparse

Authors: Dimitris Fotakis, Michael Lampis, Vangelis Th. Paschos

Abstract: It has long been known, since the classical work of (Arora, Karger, Karpinski, JCSS~99), that \MC\ admits a PTAS on dense graphs, and more generally, \kCSP\ admits a PTAS on "dense" instances with $Ω(n^k)$ constraints. In this paper we extend and generalize their exhaustive sampling approach, presenting a framework for $(1-\eps)$-approximating any \kCSP\ problem in \emph{sub-exponential} time whil… ▽ More It has long been known, since the classical work of (Arora, Karger, Karpinski, JCSS~99), that \MC\ admits a PTAS on dense graphs, and more generally, \kCSP\ admits a PTAS on "dense" instances with $Ω(n^k)$ constraints. In this paper we extend and generalize their exhaustive sampling approach, presenting a framework for $(1-\eps)$-approximating any \kCSP\ problem in \emph{sub-exponential} time while significantly relaxing the denseness requirement on the input instance. Specifically, we prove that for any constants $δ\in (0, 1]$ and $\eps > 0$, we can approximate \kCSP\ problems with $Ω(n^{k-1+δ})$ constraints within a factor of $(1-\eps)$ in time $2^{O(n^{1-δ}\ln n /\eps^3)}$. The framework is quite general and includes classical optimization problems, such as \MC, {\sc Max}-DICUT, \kSAT, and (with a slight extension) $k$-{\sc Densest Subgraph}, as special cases. For \MC\ in particular (where $k=2$), it gives an approximation scheme that runs in time sub-exponential in $n$ even for "almost-sparse" instances (graphs with $n^{1+δ}$ edges). We prove that our results are essentially best possible, assuming the ETH. First, the density requirement cannot be relaxed further: there exists a constant $r < 1$ such that for all $δ> 0$, \kSAT\ instances with $O(n^{k-1})$ clauses cannot be approximated within a ratio better than $r$ in time $2^{O(n^{1-δ})}$. Second, the running time of our algorithm is almost tight \emph{for all densities}. Even for \MC\ there exists $r<1$ such that for all $δ' > δ>0$, \MC\ instances with $n^{1+δ}$ edges cannot be approximated within a ratio better than $r$ in time $2^{n^{1-δ'}}$. △ Less

Submitted 15 July, 2015; originally announced July 2015.

arXiv:1507.02301 [pdf, ps, other]

Who to Trust for Truthfully Maximizing Welfare?

Authors: Dimitris Fotakis, Christos Tzamos, Emmanouil Zampetakis

Abstract: We introduce a general approach based on \emph{selective verification} and obtain approximate mechanisms without money for maximizing the social welfare in the general domain of utilitarian voting. Having a good allocation in mind, a mechanism with verification selects few critical agents and detects, using a verification oracle, whether they have reported truthfully. If yes, the mechanism produce… ▽ More We introduce a general approach based on \emph{selective verification} and obtain approximate mechanisms without money for maximizing the social welfare in the general domain of utilitarian voting. Having a good allocation in mind, a mechanism with verification selects few critical agents and detects, using a verification oracle, whether they have reported truthfully. If yes, the mechanism produces the desired allocation. Otherwise, the mechanism ignores any misreports and proceeds with the remaining agents. We obtain randomized truthful (or almost truthful) mechanisms without money that verify only $O(\ln m / ε)$ agents, where $m$ is the number of outcomes, independently of the total number of agents, and are $(1-ε)$-approximate for the social welfare. We also show that any truthful mechanism with a constant approximation ratio needs to verify $Ω(\log m)$ agents. A remarkable property of our mechanisms is \emph{robustness}, namely that their outcome depends only on the reports of the truthful agents. △ Less

Submitted 8 July, 2015; originally announced July 2015.

arXiv:1312.4203 [pdf, other]

Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors

Authors: Dimitrios Fotakis, Ioannis Milis, Emmanouil Zampetakis, Georgios Zois

Abstract: We propose constant approximation algorithms for generalizations of the Flexible Flow Shop (FFS) problem which form a realistic model for non-preemptive scheduling in MapReduce systems. Our results concern the minimization of the total weighted completion time of a set of MapReduce jobs on unrelated processors and improve substantially on the model proposed by Moseley et al. (SPAA 2011) in two dir… ▽ More We propose constant approximation algorithms for generalizations of the Flexible Flow Shop (FFS) problem which form a realistic model for non-preemptive scheduling in MapReduce systems. Our results concern the minimization of the total weighted completion time of a set of MapReduce jobs on unrelated processors and improve substantially on the model proposed by Moseley et al. (SPAA 2011) in two directions. First, we consider each job consisting of multiple Map and Reduce tasks, as this is the key idea behind MapReduce computations, and we propose a constant approximation algorithm. Then, we introduce into our model the crucial cost of data shuffle phase, i.e., the cost for the transmission of intermediate data from Map to Reduce tasks. In fact, we model this phase by an additional set of Shuffle tasks for each job and we manage to keep the same approximation ratio when they are scheduled on the same processors with the corresponding Reduce tasks and to provide also a constant ratio when they are scheduled on different processors. This is the most general setting of the FFS problem (with a special third stage) for which a constant approximation ratio is known. △ Less

Submitted 24 June, 2014; v1 submitted 15 December, 2013; originally announced December 2013.

arXiv:1312.2990 [pdf, ps, other]

Efficient Lineage for SUM Aggregate Queries

Authors: Foto N. Afrati, Dimitris Fotakis, Angelos Vasilakopoulos

Abstract: AI systems typically make decisions and find patterns in data based on the computation of aggregate and specifically sum functions, expressed as queries, on data's attributes. This computation can become costly or even inefficient when these queries concern the whole or big parts of the data and especially when we are dealing with big data. New types of intelligent analytics require also the expla… ▽ More AI systems typically make decisions and find patterns in data based on the computation of aggregate and specifically sum functions, expressed as queries, on data's attributes. This computation can become costly or even inefficient when these queries concern the whole or big parts of the data and especially when we are dealing with big data. New types of intelligent analytics require also the explanation of why something happened. In this paper we present a randomised algorithm that constructs a small summary of the data, called Aggregate Lineage, which can approximate well and explain all sums with large values in time that depends only on its size. The size of Aggregate Lineage is practically independent on the size of the original data. Our algorithm does not assume any knowledge on the set of sum queries to be approximated. △ Less

Submitted 9 June, 2014; v1 submitted 10 December, 2013; originally announced December 2013.

arXiv:1310.0177 [pdf, ps, other]

Combinatorial Auctions without Money

Authors: Dimitris Fotakis, Piotr Krysta, Carmine Ventre

Abstract: Algorithmic Mechanism Design attempts to marry computation and incentives, mainly by leveraging monetary transfers between designer and selfish agents involved. This is principally because in absence of money, very little can be done to enforce truthfulness. However, in certain applications, money is unavailable, morally unacceptable or might simply be at odds with the objective of the mechanism.… ▽ More Algorithmic Mechanism Design attempts to marry computation and incentives, mainly by leveraging monetary transfers between designer and selfish agents involved. This is principally because in absence of money, very little can be done to enforce truthfulness. However, in certain applications, money is unavailable, morally unacceptable or might simply be at odds with the objective of the mechanism. For example, in Combinatorial Auctions (CAs), the paradigmatic problem of the area, we aim at solutions of maximum social welfare but still charge the society to ensure truthfulness. Additionally, truthfulness of CAs is poorly understood already in the case in which bidders happen to be interested in only two different sets of goods. We focus on the design of incentive-compatible CAs without money in the general setting of $k$-minded bidders. We trade monetary transfers with the observation that the mechanism can detect certain lies of the bidders: i.e., we study truthful CAs with verification and without money. We prove a characterization of truthful mechanisms, which makes an interesting parallel with the well-understood case of CAs with money for single-minded bidders. We then give a host of upper bounds on the approximation ratio obtained by either deterministic or randomized truthful mechanisms when the sets and valuations are private knowledge of the bidders. (Most of these mechanisms run in polynomial time and return solutions with (nearly) best possible approximation guarantees.) We complement these positive results with a number of lower bounds (some of which are essentially tight) that hold in the easier case of public sets. We thus provide an almost complete picture of truthfully approximating CAs in this general setting with multi-dimensional bidders. △ Less

Submitted 1 October, 2013; originally announced October 2013.

arXiv:1305.3333 [pdf, ps, other]

Strategy-Proof Facility Location for Concave Cost Functions

Authors: Dimitris Fotakis, Christos Tzamos

Abstract: We consider k-Facility Location games, where n strategic agents report their locations on the real line, and a mechanism maps them to k facilities. Each agent seeks to minimize his connection cost, given by a nonnegative increasing function of his distance to the nearest facility. Departing from previous work, that mostly considers the identity cost function, we are interested in mechanisms withou… ▽ More We consider k-Facility Location games, where n strategic agents report their locations on the real line, and a mechanism maps them to k facilities. Each agent seeks to minimize his connection cost, given by a nonnegative increasing function of his distance to the nearest facility. Departing from previous work, that mostly considers the identity cost function, we are interested in mechanisms without payments that are (group) strategyproof for any given cost function, and achieve a good approximation ratio for the social cost and/or the maximum cost of the agents. We present a randomized mechanism, called Equal Cost, which is group strategyproof and achieves a bounded approximation ratio for all k and n, for any given concave cost function. The approximation ratio is at most 2 for Max Cost and at most n for Social Cost. To the best of our knowledge, this is the first mechanism with a bounded approximation ratio for instances with k > 2 facilities and any number of agents. Our result implies an interesting separation between deterministic mechanisms, whose approximation ratio for Max Cost jumps from 2 to unbounded when k increases from 2 to 3, and randomized mechanisms, whose approximation ratio remains at most 2 for all k. On the negative side, we exclude the possibility of a mechanism with the properties of Equal Cost for strictly convex cost functions. We also present a randomized mechanism, called Pick the Loser, which applies to instances with k facilities and n = k+1 agents, and for any given concave cost function, is strongly group strategyproof and achieves an approximation ratio of 2 for Social Cost. △ Less

Submitted 14 May, 2013; originally announced May 2013.

ACM Class: F.2.0; J.4

Showing 1–50 of 55 results for author: Fotakis, D