Skip to main content

Showing 1–31 of 31 results for author: Khuller, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.11741  [pdf, other

    cs.DS cs.CC cs.DB cs.DC

    To Store or Not to Store: a graph theoretical approach for Dataset Versioning

    Authors: Anxin Guo, Jingwei Li, Pattara Sukprasert, Samir Khuller, Amol Deshpande, Koyel Mukherjee

    Abstract: In this work, we study the cost efficient data versioning problem, where the goal is to optimize the storage and reconstruction (retrieval) costs of data versions, given a graph of datasets as nodes and edges capturing edit/delta information. One central variant we study is MinSum Retrieval (MSR) where the goal is to minimize the total retrieval costs, while keeping the storage costs bounded. This… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by IPDPS 2024

  2. arXiv:2402.11109  [pdf, other

    cs.DS

    Online Flexible Busy Time Scheduling on Heterogeneous Machines

    Authors: Gruia Calinescu, Sami Davies, Samir Khuller, Shirley Zhang

    Abstract: We study the online busy time scheduling model on heterogeneous machines. In our setting, unit-length jobs arrive online with a deadline that is known to the algorithm at the job's arrival time. An algorithm has access to machines, each with different associated capacities and costs. The goal is to schedule jobs on machines before their deadline, so that the total cost incurred by the scheduling a… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2307.08979  [pdf, ps, other

    cs.DS

    Scalable Auction Algorithms for Bipartite Maximum Matching Problems

    Authors: Quanquan C. Liu, Yiduo Ke, Samir Khuller

    Abstract: In this paper, we give new auction algorithms for maximum weighted bipartite matching (MWM) and maximum cardinality bipartite $b$-matching (MCbM). Our algorithms run in $O\left(\log n/\varepsilon^8\right)$ and $O\left(\log n/\varepsilon^2\right)$ rounds, respectively, in the blackboard distributed setting. We show that our MWM algorithm can be implemented in the distributed, interactive setting us… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: To appear in APPROX 2023

  4. arXiv:2304.07982  [pdf, other

    cs.DS

    An Algorithmic Approach to Address Course Enrollment Challenges

    Authors: Arpita Biswas, Yiduo Ke, Samir Khuller, Quanquan C. Liu

    Abstract: Massive surges of enrollments in courses have led to a crisis in several computer science departments - not only is the demand for certain courses extremely high from majors, but the demand from non-majors is also very high. Much of the time, this leads to significant frustration on the part of the students, and getting seats in desired courses is a rather ad-hoc process. One approach is to first… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Abstract truncated per arXiv limits

  5. arXiv:2207.03600  [pdf, other

    cs.LG

    Individual Preference Stability for Clustering

    Authors: Saba Ahmadi, Pranjal Awasthi, Samir Khuller, Matthäus Kleindessner, Jamie Morgenstern, Pattara Sukprasert, Ali Vakilian

    Abstract: In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Accepted to ICML'22. This is a full version of the ICML version as well as a substantially improved version of arXiv:2006.04960

  6. arXiv:2207.01551  [pdf, other

    cs.DS

    Correlated Stochastic Knapsack with a Submodular Objective

    Authors: Sheng Yang, Samir Khuller, Sunav Choudhary, Subrata Mitra, Kanak Mahadik

    Abstract: We study the correlated stochastic knapsack problem of a submodular target function, with optional additional constraints. We utilize the multilinear extension of submodular function, and bundle it with an adaptation of the relaxed linear constraints from Ma [Mathematics of Operations Research, Volume 43(3), 2018] on correlated stochastic knapsack problem. The relaxation is then solved by the stoc… ▽ More

    Submitted 3 August, 2022; v1 submitted 4 July, 2022; originally announced July 2022.

    Comments: Accepted to ESA 2022. (fix typo in previous version)

  7. arXiv:2206.01360  [pdf, other

    cs.DS

    Balancing Flow Time and Energy Consumption

    Authors: Sami Davies, Samir Khuller, Shirley Zhang

    Abstract: In this paper, we study the following batch scheduling model: find a schedule that minimizes total flow time for $n$ uniform length jobs, with release times and deadlines, where the machine is only actively processing jobs in at most $k$ synchronized batches of size at most $B$. Prior work on such batch scheduling models has considered only feasibility with no regard to the flow time of the schedu… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  8. arXiv:2007.07384  [pdf, other

    cs.LG cs.DS stat.ML

    A Pairwise Fair and Community-preserving Approach to k-Center Clustering

    Authors: Brian Brubach, Darshan Chakrabarti, John P. Dickerson, Samir Khuller, Aravind Srinivasan, Leonidas Tsepenekas

    Abstract: Clustering is a foundational problem in machine learning with numerous applications. As machine learning increases in ubiquity as a backend for automated systems, concerns about fairness arise. Much of the current literature on fairness deals with discrimination against protected classes in supervised learning (group fairness). We define a different notion of fair clustering wherein the probabilit… ▽ More

    Submitted 14 July, 2020; originally announced July 2020.

  9. arXiv:2001.00257  [pdf, ps, other

    cs.DM cs.DS math.CO

    Multi-transversals for Triangles and the Tuza's Conjecture

    Authors: Parinya Chalermsook, Samir Khuller, Pattara Sukprasert, Sumedha Uniyal

    Abstract: In this paper, we study a primal and dual relationship about triangles: For any graph $G$, let $ν(G)$ be the maximum number of edge-disjoint triangles in $G$, and $τ(G)$ be the minimum subset $F$ of edges such that $G \setminus F$ is triangle-free. It is easy to see that $ν(G) \leq τ(G) \leq 3 ν(G)$, and in fact, this rather obvious inequality holds for a much more general primal-dual relation bet… ▽ More

    Submitted 3 February, 2021; v1 submitted 1 January, 2020; originally announced January 2020.

    Comments: Accepted at SODA'20

  10. arXiv:1909.03350  [pdf, other

    cs.AI cs.DS

    An Algorithm for Multi-Attribute Diverse Matching

    Authors: Saba Ahmadi, Faez Ahmed, John P. Dickerson, Mark Fuge, Samir Khuller

    Abstract: Bipartite b-matching, where agents on one side of a market are matched to one or more agents or items on the other, is a classical model that is used in myriad application areas such as healthcare, advertising, education, and general resource allocation. Traditionally, the primary goal of such models is to maximize a linear function of the constituent matches (e.g., linear social welfare maximizat… ▽ More

    Submitted 12 February, 2020; v1 submitted 7 September, 2019; originally announced September 2019.

  11. arXiv:1907.00117  [pdf, ps, other

    cs.DS

    Min-Max Correlation Clustering via MultiCut

    Authors: Saba Ahmadi, Sainyam Galhotra, Samir Khuller, Barna Saha, Roy Schwartz

    Abstract: Correlation clustering is a fundamental combinatorial optimization problem arising in many contexts and applications that has been the subject of dozens of papers in the literature. In this problem we are given a general weighted graph where each edge is labeled positive or negative. The goal is to obtain a partitioning (clustering) of the vertices that minimizes disagreements - weight of negative… ▽ More

    Submitted 28 June, 2019; originally announced July 2019.

  12. Near Optimal Coflow Scheduling in Networks

    Authors: Mosharaf Chowdhury, Samir Khuller, Manish Purohit, Sheng Yang, Jie You

    Abstract: The coflow scheduling problem has emerged as a popular abstraction in the last few years to study data communication problems within a data center. In this basic framework, each coflow has a set of communication demands and the goal is to schedule many coflows in a manner that minimizes the total weighted completion time. A coflow is said to complete when all its communication needs are met. This… ▽ More

    Submitted 17 June, 2019; originally announced June 2019.

  13. arXiv:1811.10319  [pdf, other

    cs.DS

    On the cost of essentially fair clusterings

    Authors: Ioana O. Bercea, Martin Groß, Samir Khuller, Aounon Kumar, Clemens Rösner, Daniel R. Schmidt, Melanie Schmidt

    Abstract: Clustering is a fundamental tool in data mining. It partitions points into groups (clusters) and may be used to make decisions for each point based on its group. However, this process may harm protected (minority) classes if the clustering algorithm does not adequately represent them in desirable clusters -- especially if the data is already biased. At NIPS 2017, Chierichetti et al. proposed a m… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

  14. arXiv:1704.06677  [pdf, ps, other

    cs.DS

    Select and Permute: An Improved Online Framework for Scheduling to Minimize Weighted Completion Time

    Authors: Samir Khuller, Jingling Li, Pascal Sturmfels, Kevin Sun, Prayaag Venkat

    Abstract: In this paper, we introduce a new online scheduling framework for minimizing total weighted completion time in a general setting. The framework is inspired by the work of Hall et al. [Mathematics of Operations Research, Vol 22(3):513-544, 1997] and Garg et al. [Proc. of Foundations of Software Technology and Theoretical Computer Science, pp. 96-107, 2007], who show how to convert an offline approx… ▽ More

    Submitted 21 April, 2017; originally announced April 2017.

    Comments: 17 pages

  15. Scheduling Distributed Clusters of Parallel Machines: Primal-Dual and LP-based Approximation Algorithms [Full Version]

    Authors: Riley Murray, Samir Khuller, Megan Chao

    Abstract: The Map-Reduce computing framework rose to prominence with datasets of such size that dozens of machines on a single cluster were needed for individual jobs. As datasets approach the exabyte scale, a single job may need distributed processing not only on multiple machines, but on multiple clusters. We consider a scheduling problem to minimize weighted average completion time of N jobs on M distrib… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: A shorter version of this paper (one that omitted several proofs) appeared in the proceedings of the 2016 European Symposium on Algorithms

    ACM Class: F.2.2

    Journal ref: Leibniz International Proceedings in Informatics (LIPIcs), Volume 58, 2016, pages 68:1--68:17

  16. arXiv:1610.08154  [pdf, other

    cs.DS

    LP Rounding and Combinatorial Algorithms for Minimizing Active and Busy Time

    Authors: Jessica Chang, Samir Khuller, Koyel Mukherjee

    Abstract: We consider fundamental scheduling problems motivated by energy issues. In this framework, we are given a set of jobs, each with a release time, deadline and required processing length. The jobs need to be scheduled on a machine so that at most g jobs are active at any given time. The duration for which a machine is active (i.e., "on") is referred to as its active time. The goal is to find a feasi… ▽ More

    Submitted 25 October, 2016; originally announced October 2016.

    Comments: 31 pages, originally appeared in SPAA 2014

  17. arXiv:1607.05791  [pdf, other

    cs.CG cs.DS cs.RO

    Minimizing Uncertainty through Sensor Placement with Angle Constraints

    Authors: Ioana O. Bercea, Volkan Isler, Samir Khuller

    Abstract: We study the problem of sensor placement in environments in which localization is a necessity, such as ad-hoc wireless sensor networks that allow the placement of a few anchors that know their location or sensor arrays that are tracking a target. In most of these situations, the quality of localization depends on the relative angle between the target and the pair of sensors observing it. In this p… ▽ More

    Submitted 19 July, 2016; originally announced July 2016.

  18. arXiv:1606.08022  [pdf, other

    cs.DS

    Constant factor Approximation Algorithms for Uniform Hard Capacitated Facility Location Problems: Natural LP is not too bad

    Authors: Sapna Grover, Neelima Gupta, Samir Khuller, Aditya Pancholi

    Abstract: In this paper, we give first constant factor approximation for capacitated knapsack median problem (CKM) for hard uniform capacities, violating the budget only by an additive factor of $f_{max}$ where $f_{max}$ is the maximum cost of a facility opened by the optimal and violating capacities by $(2+ε)$ factor. Natural LP for the problem is known to have an unbounded integrality gap when any one of… ▽ More

    Submitted 23 March, 2022; v1 submitted 26 June, 2016; originally announced June 2016.

  19. arXiv:1510.03130  [pdf, other

    cs.LG

    On Correcting Inputs: Inverse Optimization for Online Structured Prediction

    Authors: Hal Daumé III, Samir Khuller, Manish Purohit, Gregory Sanders

    Abstract: Algorithm designers typically assume that the input data is correct, and then proceed to find "optimal" or "sub-optimal" solutions using this input data. However this assumption of correct data does not always hold in practice, especially in the context of online learning systems where the objective is to learn appropriate feature weights given some training samples. Such scenarios necessitate the… ▽ More

    Submitted 11 October, 2015; originally announced October 2015.

    Comments: Conference version to appear in FSTTCS, 2015

  20. arXiv:1311.2309  [pdf, other

    cs.DS

    Analyzing the Optimal Neighborhood: Algorithms for Budgeted and Partial Connected Dominating Set Problems

    Authors: Samir Khuller, Manish Purohit, Kanthi Sarpatwar

    Abstract: We study partial and budgeted versions of the well studied connected dominating set problem. In the partial connected dominating set problem, we are given an undirected graph G = (V,E) and an integer n', and the goal is to find a minimum subset of vertices that induces a connected subgraph of G and dominates at least n' vertices. We obtain the first polynomial time algorithm with an O(\ln Δ) appro… ▽ More

    Submitted 10 November, 2013; originally announced November 2013.

    Comments: 15 pages, Conference version to appear in ACM-SIAM SODA 2014

  21. arXiv:1302.4168  [pdf, other

    cs.DB cs.DC

    Data Placement and Replica Selection for Improving Co-location in Distributed Environments

    Authors: K. Ashwin Kumar, Amol Deshpande, Samir Khuller

    Abstract: Increasing need for large-scale data analytics in a number of application domains has led to a dramatic rise in the number of distributed data management systems, both parallel relational databases, and systems that support alternative frameworks like MapReduce. There is thus an increasing contention on scarce data center resources like network bandwidth; further, the energy requirements for power… ▽ More

    Submitted 18 February, 2013; originally announced February 2013.

    Comments: 12 pages, 22 figures

  22. arXiv:1208.3054  [pdf, other

    cs.DS

    LP Rounding for k-Centers with Non-uniform Hard Capacities

    Authors: Marek Cygan, MohammadTaghi Hajiaghayi, Samir Khuller

    Abstract: In this paper we consider a generalization of the classical k-center problem with capacities. Our goal is to select k centers in a graph, and assign each node to a nearby center, so that we respect the capacity constraints on centers. The objective is to minimize the maximum distance a node has to travel to get to its assigned center. This problem is NP-hard, even when centers have no capacity res… ▽ More

    Submitted 15 August, 2012; originally announced August 2012.

    Comments: To appear in FOCS 2012

  23. arXiv:1208.0312  [pdf, ps, other

    cs.DS

    A Model for Minimizing Active Processor Time

    Authors: Jessica Chang, Harold N. Gabow, Samir Khuller

    Abstract: We introduce the following elementary scheduling problem. We are given a collection of n jobs, where each job has an integer length as well as a set Ti of time intervals in which it can be feasibly scheduled. Given a parameter B, the processor can schedule up to B jobs at a timeslot t so long as it is "active" at t. The goal is to schedule all the jobs in the fewest number of active timeslots. The… ▽ More

    Submitted 1 August, 2012; originally announced August 2012.

  24. arXiv:0907.5442  [pdf, ps, other

    cs.NI cs.IT

    On Computing Compression Trees for Data Collection in Sensor Networks

    Authors: Jian Li, Amol Deshpande, Samir Khuller

    Abstract: We address the problem of efficiently gathering correlated data from a wired or a wireless sensor network, with the aim of designing algorithms with provable optimality guarantees, and understanding how close we can get to the known theoretical lower bounds. Our proposed approach is based on finding an optimal or a near-optimal {\em compression tree} for a given sensor network: a compression tre… ▽ More

    Submitted 30 July, 2009; originally announced July 2009.

  25. Designing Multi-Commodity Flow Trees

    Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

    Abstract: The traditional multi-commodity flow problem assumes a given flow network in which multiple commodities are to be maximally routed in response to given demands. This paper considers the multi-commodity flow network-design problem: given a set of multi-commodity flow demands, find a network subject to certain constraints such that the commodities can be maximally routed. This paper focuses on t… ▽ More

    Submitted 30 May, 2002; originally announced May 2002.

    Comments: Conference version in WADS'93

    ACM Class: F.2.2; G.2.2

    Journal ref: Information Processing Letters 50:49-55 (1994)

  26. A Network-Flow Technique for Finding Low-Weight Bounded-Degree Spanning Trees

    Authors: S. Fekete, S. Khuller, M. Klemmstein, B. Raghavachari, Neal E. Young

    Abstract: The problem considered is the following. Given a graph with edge weights satisfying the triangle inequality, and a degree bound for each vertex, compute a low-weight spanning tree such that the degree of each vertex is at most its specified bound. The problem is NP-hard (it generalizes Traveling Salesman (TSP)). This paper describes a network-flow heuristic for modifying a given tree T to meet t… ▽ More

    Submitted 18 May, 2002; originally announced May 2002.

    ACM Class: F.2.2; G.2.2

    Journal ref: Journal of Algorithms 24(2):310-324 (1997)

  27. Balancing Minimum Spanning and Shortest Path Trees

    Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

    Abstract: This paper give a simple linear-time algorithm that, given a weighted digraph, finds a spanning tree that simultaneously approximates a shortest-path tree and a minimum spanning tree. The algorithm provides a continuous trade-off: given the two trees and epsilon > 0, the algorithm returns a spanning tree in which the distance between any vertex and the root of the shortest-path tree is at most 1… ▽ More

    Submitted 18 May, 2002; originally announced May 2002.

    Comments: conference version: ACM-SIAM Symposium on Discrete Algorithms (1993)

    ACM Class: F.2.2; G.2.2

    Journal ref: Algorithmica 14(4):305-322 (1995)

  28. Low-Degree Spanning Trees of Small Weight

    Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

    Abstract: The degree-d spanning tree problem asks for a minimum-weight spanning tree in which the degree of each vertex is at most d. When d=2 the problem is TSP, and in this case, the well-known Christofides algorithm provides a 1.5-approximation algorithm (assuming the edge weights satisfy the triangle inequality). In 1984, Christos Papadimitriou and Umesh Vazirani posed the challenge of finding an al… ▽ More

    Submitted 18 May, 2002; originally announced May 2002.

    Comments: conference version in Symposium on Theory of Computing (1994)

    ACM Class: F.2.2; G.2.2

    Journal ref: SIAM J. Computing 25(2):355-368 (1996)

  29. Approximating the Minimum Equivalent Digraph

    Authors: Samir Khuller, Balaji Raghavachari, Neal E. Young

    Abstract: The MEG (minimum equivalent graph) problem is, given a directed graph, to find a small subset of the edges that maintains all reachability relations between nodes. The problem is NP-hard. This paper gives an approximation algorithm with performance guarantee of pi^2/6 ~ 1.64. The algorithm and its analysis are based on the simple idea of contracting long cycles. (This result is strengthened slig… ▽ More

    Submitted 18 May, 2002; originally announced May 2002.

    Comments: conference version in ACM-SIAM Symposium on Discrete Algorithms (1994)

    ACM Class: F.2.2; G.2.2

    Journal ref: SIAM J. Computing 24(4):859-872 (1995)

  30. A Primal-Dual Parallel Approximation Technique Applied to Weighted Set and Vertex Cover

    Authors: Samir Khuller, Uzi Vishkin, Neal Young

    Abstract: The paper describes a simple deterministic parallel/distributed (2+epsilon)-approximation algorithm for the minimum-weight vertex-cover problem and its dual (edge/element packing).

    Submitted 18 May, 2002; originally announced May 2002.

    Comments: conference version appeared in IPCO'93

    ACM Class: F.2.0; G.1.6; C.2.4

    Journal ref: Journal of Algorithms 17(2):280-289 (1994)

  31. On Strongly Connected Digraphs with Bounded Cycle Length

    Authors: Samir Khuller, Balaji Raghavachari, Neal Young

    Abstract: The MEG (minimum equivalent graph) problem is, given a directed graph, to find a small subset of the edges that maintains all reachability relations between nodes. The problem is NP-hard. This paper gives a proof that, for graphs where each directed cycle has at most three edges, the MEG problem is equivalent to maximum bipartite matching, and therefore solvable in polynomial time. This leads to… ▽ More

    Submitted 9 May, 2002; originally announced May 2002.

    ACM Class: F.2.0; F.1.3

    Journal ref: Discrete Applied Mathematics 69(3):281-289 (1996)