Skip to main content

Showing 1–27 of 27 results for author: Ahmadian, S

.
  1. arXiv:2409.11378  [pdf, other

    cs.CL cs.AI

    Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement

    Authors: Simon Yu, Liangyu Chen, Sara Ahmadian, Marzieh Fadaee

    Abstract: Finetuning large language models on instruction data is crucial for enhancing pre-trained knowledge and improving instruction-following capabilities. As instruction datasets proliferate, selecting optimal data for effective training becomes increasingly important. This work addresses the question: How can we determine the optimal subset of data for effective training? While existing research often… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 21 pages, 6 figures

  2. arXiv:2405.18754  [pdf, other

    cs.DS cs.LG

    GIST: Greedy Independent Set Thresholding for Diverse Data Summarization

    Authors: Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo

    Abstract: We introduce a novel subset selection problem called min-distance diversification with monotone submodular utility ($\textsf{MDMS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal of $\textsf{MDMS}$ is to maximize an objective function combining a monotone submodular utility term and a min-… ▽ More

    Submitted 10 February, 2025; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 19 pages, 3 figures

  3. arXiv:2405.17780  [pdf, other

    cs.DS

    Unmasking Vulnerabilities: Cardinality Sketches under Adaptive Inputs

    Authors: Sara Ahmadian, Edith Cohen

    Abstract: Cardinality sketches are popular data structures that enhance the efficiency of working with large data sets. The sketches are randomized representations of sets that are only of logarithmic size but can support set merges and approximate cardinality (i.e., distinct count) queries. When queries are not adaptive, that is, they do not depend on preceding query responses, the design provides strong g… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Journal ref: ICML 2024

  4. arXiv:2404.10286  [pdf, other

    quant-ph

    A novel scheme for modelling dissipation or thermalization in open quantum systems

    Authors: Fardin Kheirandish, Elmira Bolandhemmat, Narges Cheraghpour, Ronak Moradi, Servieh Ahmadian

    Abstract: In this letter, we introduce a novel method for investigating dissipation (gain) and thermalization in an open quantum system. In this method, the quantum system is coupled linearly with a copy of itself or with another system described by a finite number of bosonic operators. The time-dependent coupling functions play a fundamental role in this scheme. To demonstrate the efficiency and significan… ▽ More

    Submitted 15 November, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: 22 pages, 8 figures

  5. arXiv:2402.03252  [pdf, other

    cs.LG cs.CY

    Fair Active Ranking from Pairwise Preferences

    Authors: Sruthi Gorantla, Sara Ahmadian

    Abstract: We investigate the problem of probably approximately correct and fair (PACF) ranking of items by adaptively evoking pairwise comparisons. Given a set of $n$ items that belong to disjoint groups, our goal is to find an $(ε, δ)$-PACF-Ranking according to a fair objective function that we propose. We assume access to an oracle, wherein, for each query, the learner can choose a pair of items and recei… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 39 pages, 3.1 MB

  6. arXiv:2401.05987  [pdf, ps, other

    cs.SE

    Reconstruction as a service: a data space for off-site image reconstruction in magnetic particle imaging

    Authors: Anselm von Gladiss, Amir Shayan Ahmadian, Jan Jürjens

    Abstract: Magnetic particle imaging (MPI) is an emerging medical imaging modality which offers a unique combination of high temporal and spatial resolution, sensitivity and biocompatibility. For system-matrix (SM) based image reconstruction in MPI, a huge amount of calibration data needs to be acquired prior to reconstruction in a time-consuming procedure. Conventionally, the data is recorded on-site inside… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  7. DeMEtRIS: Counting (near)-Cliques by Crawling

    Authors: Suman K. Bera, Jayesh Choudhari, Shahrzad Haddadan, Sara Ahmadian

    Abstract: We study the problem of approximately counting cliques and near cliques in a graph, where the access to the graph is only available through crawling its vertices; thus typically seeing only a small portion of it. This model, known as the random walk model or the neighborhood query model has been introduced recently and captures real-life scenarios in which the entire graph is too massive to be sto… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

  8. arXiv:2206.05050  [pdf, other

    cs.LG cs.DS

    Improved Approximation for Fair Correlation Clustering

    Authors: Sara Ahmadian, Maryam Negahbani

    Abstract: Correlation clustering is a ubiquitous paradigm in unsupervised machine learning where addressing unfairness is a major challenge. Motivated by this, we study Fair Correlation Clustering where the data points may belong to different protected groups and the goal is to ensure fair representation of all groups across clusters. Our paper significantly generalizes and improves on the quality guarantee… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  9. A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

    Authors: Mostafa Kishani, Saba Ahmadian, Hossein Asadi

    Abstract: To help reliability of SSD arrays, Redundant Array of Independent Disks (RAID) are commonly employed. However, the conventional reliability models of HDD RAID cannot be applied to SSD arrays, as the nature of failures in SSDs are different from HDDs. Previous studies on the reliability of SSD arrays are based on the deprecated SSD failure data, and only focus on limited failure types, device failu… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Journal ref: in IEEE Transactions on Computers, vol. 69, no. 5, pp. 649-665, 1 May 2020

  10. ETICA: Efficient Two-Level I/O Caching Architecture for Virtualized Platforms

    Authors: Saba Ahmadian, Reza Salkhordeh, Onur Mutlu, Hossein Asadi

    Abstract: In this paper, we propose an Efficient Two-Level I/O Caching Architecture (ETICA) for virtualized platforms that can significantly improve I/O latency, endurance, and cost (in terms of cache size) while preserving the reliability of write-pending data blocks. As opposed to previous one-level I/O caching schemes in virtualized platforms, our proposed architecture 1) provides two levels of cache by… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

    Journal ref: IEEE Transactions on Parallel and Distributed Systems (Volume: 32, Issue: 10, Oct. 1 2021)

  11. arXiv:2102.11548  [pdf, other

    cs.DS

    Maximizing Agreements for Ranking, Clustering and Hierarchical Clustering via MAX-CUT

    Authors: Vaggos Chatziafratis, Mohammad Mahdian, Sara Ahmadian

    Abstract: In this paper, we study a number of well-known combinatorial optimization problems that fit in the following paradigm: the input is a collection of (potentially inconsistent) local relationships between the elements of a ground set (e.g., pairwise comparisons, similar/dissimilar pairs, or ancestry structure of triples of points), and the goal is to aggregate this information into a global structur… ▽ More

    Submitted 23 February, 2021; originally announced February 2021.

    Comments: AISTATS 2021 accepted paper

  12. arXiv:2012.01691  [pdf, other

    cs.SI

    The Wedge Picking Model: A dynamic graph model based on triadic closure

    Authors: Sara Ahmadian, Shahrzad Haddadan

    Abstract: Social networks have become an inseparable part of human life and processing them in an efficient manner is a top priority in the study of networks. These networks are highly dynamic and they are growing incessantly. Inspired by the concept of triadic closure, we propose a probabilistic mechanism to model the evolution of these dynamic graphs. Although triadic closure is ubiquitous in social netwo… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  13. arXiv:2006.10221  [pdf, other

    cs.DS cs.LG stat.ML

    Fair Hierarchical Clustering

    Authors: Sara Ahmadian, Alessandro Epasto, Marina Knittel, Ravi Kumar, Mohammad Mahdian, Benjamin Moseley, Philip Pham, Sergei Vassilvitskii, Yuyan Wang

    Abstract: As machine learning has become more prevalent, researchers have begun to recognize the necessity of ensuring machine learning systems are fair. Recently, there has been an interest in defining a notion of fairness that mitigates over-representation in traditional clustering. In this paper we extend this notion to hierarchical clustering, where the goal is to recursively partition the data to opt… ▽ More

    Submitted 18 June, 2020; v1 submitted 17 June, 2020; originally announced June 2020.

  14. arXiv:2002.02274  [pdf, other

    cs.DS cs.AI cs.LG stat.ML

    Fair Correlation Clustering

    Authors: Sara Ahmadian, Alessandro Epasto, Ravi Kumar, Mohammad Mahdian

    Abstract: In this paper, we study correlation clustering under fairness constraints. Fair variants of $k$-median and $k$-center clustering have been studied recently, and approximation algorithms using a notion called fairlet decomposition have been proposed. We obtain approximation algorithms for fair correlation clustering under several important types of fairness constraints. Our results hinge on obtai… ▽ More

    Submitted 2 March, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

  15. arXiv:1912.06983  [pdf, other

    cs.DS

    Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection

    Authors: Sara Ahmadian, Vaggos Chatziafratis, Alessandro Epasto, Euiwoong Lee, Mohammad Mahdian, Konstantin Makarychev, Grigory Yaroslavtsev

    Abstract: Hierarchical Clustering is an unsupervised data analysis method which has been widely used for decades. Despite its popularity, it had an underdeveloped analytical foundation and to address this, Dasgupta recently introduced an optimization viewpoint of hierarchical clustering with pairwise similarity information that spurred a line of work shedding light on old algorithms (e.g., Average-Linkage),… ▽ More

    Submitted 15 December, 2019; originally announced December 2019.

  16. arXiv:1912.01555  [pdf, other

    cs.DC

    Evaluating Reliability of SSD-Based I/O Caches in Enterprise Storage Systems

    Authors: Saba Ahmadian, Farhad Taheri, Hossein Asadi

    Abstract: In this paper, we present a comprehensive analysis investigating the reliability of SSD-based I/O caching architectures used in enterprise storage systems under power failure and high-operating temperature. We explore variety of SSDs from top vendors and investigate the cache reliability in mirrored configuration. To this end, we first develop a physical fault injection and failure detection platf… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

  17. arXiv:1909.06667  [pdf, ps, other

    cs.IR

    LGLMF: Local Geographical based Logistic Matrix Factorization Model for POI Recommendation

    Authors: Hossein A. Rahmani, Mohammad Aliannejadi, Sajad Ahmadian, Mitra Baratchi, Mohsen Afsharchi, Fabio Crestani

    Abstract: With the rapid growth of Location-Based Social Networks, personalized Points of Interest (POIs) recommendation has become a critical task to help users explore their surroundings. Due to the scarcity of check-in data, the availability of geographical information offers an opportunity to improve the accuracy of POI recommendation. Moreover, matrix factorization methods provide effective models whic… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

    Comments: 13 pages, 1 figures

  18. Clustering without Over-Representation

    Authors: Sara Ahmadian, Alessandro Epasto, Ravi Kumar, Mohammad Mahdian

    Abstract: In this paper we consider clustering problems in which each point is endowed with a color. The goal is to cluster the points to minimize the classical clustering cost but with the additional constraint that no color is over-represented in any cluster. This problem is motivated by practical clustering settings, e.g., in clustering news articles where the color of an article is its source, it is pre… ▽ More

    Submitted 29 May, 2019; originally announced May 2019.

    Comments: 10 pages, 6 figures, in KDD 2019

    ACM Class: I.5.3; G.1.6; H.2.8; F.2.2

    Journal ref: in Proceedings of The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2019

  19. arXiv:1902.09329  [pdf

    cs.GT

    A New Method To Find The Nash Equilibrium Point in Financial Transmission Rights Bidding Problem

    Authors: Saeed Ahmadian, Ramin Farajifijani

    Abstract: Financial transmission right (FTR) is an important tool and an especially feature for stopping congestion charges in restructured electricity markets. Participants in the transmission market as players are assumed to be a generation company (Gencos) which also take part in an energy market and able to buy their require FTRs. In this regard, there are two types of FTR: obligation or option. There a… ▽ More

    Submitted 21 February, 2019; originally announced February 2019.

    Comments: 2 figs, 6 tables, Energy Economics Journals (Elsevier)

  20. arXiv:1812.08720  [pdf, other

    cs.PF

    LBICA: A Load Balancer for I/O Cache Architectures

    Authors: Saba Ahmadian, Reza Salkhordeh, Hossein Asadi

    Abstract: In recent years, enterprise Solid-State Drives (SSDs) are used in the caching layer of high-performance servers to close the growing performance gap between processing units and storage subsystem. SSD-based I/O caching is typically not effective in workloads with burst accesses in which the caching layer itself becomes the performance bottleneck because of the large number of accesses. Existing I/… ▽ More

    Submitted 5 December, 2018; originally announced December 2018.

    Comments: 6 pages

  21. ECI-Cache: A High-Endurance and Cost-Efficient I/O Caching Scheme for Virtualized Platforms

    Authors: Saba Ahmadian, Onur Mutlu, Hossein Asadi

    Abstract: In recent years, high interest in using Virtual Machines (VMs) in data centers and Cloud computing has significantly increased the demand for high-performance data storage systems. Recent studies suggest using SSDs as a caching layer for HDD-based storage subsystems in virtualization platforms. Such studies neglect to address the endurance and cost of SSDs, which can significantly affect the effic… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems 2.1 (2018): 9

  22. Investigating Power Outage Effects on Reliability of Solid-State Drives

    Authors: Saba Ahmadian, Farhad Taheri, Mehrshad Lotfi, Maryam Karimi, Hossein Asad

    Abstract: Solid-State Drives (SSDs) are recently employed in enterprise servers and high-end storage systems in order to enhance performance of storage subsystem. Although employing high speed SSDs in the storage subsystems can significantly improve system performance, it comes with significant reliability threat for write operations upon power failures. In this paper, we present a comprehensive analysis in… ▽ More

    Submitted 29 April, 2018; originally announced May 2018.

    Comments: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018. IEEE, 2018

  23. arXiv:1705.10396  [pdf, other

    cs.DS

    Further Approximations for Demand Matching: Matroid Constraints and Minor-Closed Graphs

    Authors: Sara Ahmadian, Zachary Friggstad

    Abstract: We pursue a study of the Generalized Demand Matching problem, a common generalization of the $b$-Matching and Knapsack problems. Here, we are given a graph with vertex capacities, edge profits, and asymmetric demands on the edges. The goal is to find a maximum-profit subset of edges so the demands of chosen edges do not violate vertex capacities. This problem is APX-hard and constant-factor approx… ▽ More

    Submitted 29 May, 2017; originally announced May 2017.

  24. arXiv:1612.07925  [pdf, ps, other

    cs.DS

    Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

    Authors: Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, Justin Ward

    Abstract: Clustering is a classic topic in optimization with $k$-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for $k$-means with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of $9+ε$, a ratio that is known to be tight with respect to such methods. We overcome this barrier by p… ▽ More

    Submitted 10 April, 2017; v1 submitted 23 December, 2016; originally announced December 2016.

  25. arXiv:1608.01700  [pdf, other

    cs.DS

    Approximation Algorithms for Clustering Problems with Lower Bounds and Outliers

    Authors: Sara Ahmadian, Chaitanya Swamy

    Abstract: We consider clustering problems with {\em non-uniform lower bounds and outliers}, and obtain the {\em first approximation guarantees} for these problems. We have a set $\F$ of facilities with lower bounds $\{L_i\}_{i\in\F}$ and a set $\D$ of clients located in a common metric space $\{c(i,j)\}_{i,j\in\F\cup\D}$, and bounds $k$, $m$. A feasible solution is a pair… ▽ More

    Submitted 3 November, 2016; v1 submitted 4 August, 2016; originally announced August 2016.

    ACM Class: F.2.2; G.1.6; G.2

  26. arXiv:1301.4478  [pdf, other

    cs.DS cs.DM

    Local-Search based Approximation Algorithms for Mobile Facility Location Problems

    Authors: Sara Ahmadian, Zachary Friggstad, Chaitanya Swamy

    Abstract: We consider the {\em mobile facility location} (\mfl) problem. We are given a set of facilities and clients located in a common metric space. The goal is to move each facility from its initial location to a destination and assign each client to the destination of some facility so as to minimize the sum of the movement-costs of the facilities and the client-assignment costs. This abstracts facility… ▽ More

    Submitted 18 January, 2013; originally announced January 2013.

    ACM Class: F.2.2; G.1.6; G.2.1

  27. arXiv:1104.3128  [pdf, ps, other

    cs.DS cs.DM

    Improved Approximation Guarantees for Lower-Bounded Facility Location

    Authors: Sara Ahmadian, Chaitanya Swamy

    Abstract: We consider the {\em lower-bounded facility location} (\lbfl) problem (also sometimes called {\em load-balanced facility location}), which is a generalization of {\em uncapacitated facility location} (\ufl), where each open facility is required to serve a certain {\em minimum} amount of demand. More formally, an instance $\I$ of \lbfl is specified by a set $\F$ of facilities with facility-opening… ▽ More

    Submitted 29 August, 2012; v1 submitted 15 April, 2011; originally announced April 2011.