Skip to main content

Showing 1–41 of 41 results for author: Awan, J

.
  1. arXiv:2505.00877  [pdf, other

    stat.CO

    Particle Filter for Bayesian Inference on Privatized Data

    Authors: Yu-Wei Chen, Pranav Sanghi, Jordan Awan

    Abstract: Differential Privacy (DP) is a probabilistic framework that protects privacy while preserving data utility. To protect the privacy of the individuals in the dataset, DP requires adding a precise amount of noise to a statistic of interest; however, this noise addition alters the resulting sampling distribution, making statistical inference challenging. One of the main DP goals in Bayesian analysis… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  2. arXiv:2504.10166  [pdf, other

    cs.MM

    Fact-Checking with Contextual Narratives: Leveraging Retrieval-Augmented LLMs for Social Media Analysis

    Authors: Arka Ujjal Dey, Muhammad Junaid Awan, Georgia Channing, Christian Schroeder de Witt, John Collomosse

    Abstract: We propose CRAVE (Cluster-based Retrieval Augmented Verification with Explanation); a novel framework that integrates retrieval-augmented Large Language Models (LLMs) with clustering techniques to address fact-checking challenges on social media. CRAVE automatically retrieves multimodal evidence from diverse, often contradictory, sources. Evidence is clustered into coherent narratives, and evaluat… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  3. arXiv:2501.18121  [pdf, other

    stat.ML cs.CR cs.LG math.ST

    Optimal Survey Design for Private Mean Estimation

    Authors: Yu-Wei Chen, Raghu Pasupathy, Jordan A. Awan

    Abstract: This work identifies the first privacy-aware stratified sampling scheme that minimizes the variance for general private mean estimation under the Laplace, Discrete Laplace (DLap) and Truncated-Uniform-Laplace (TuLap) mechanisms within the framework of differential privacy (DP). We view stratified sampling as a subsampling operation, which amplifies the privacy guarantee; however, to have the same… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

  4. arXiv:2412.14503  [pdf, other

    stat.CO

    dapper: Data Augmentation for Private Posterior Estimation in R

    Authors: Kevin Eng, Jordan A. Awan, Nianqiao Phyllis Ju, Vinayak A. Rao, Ruobin Gong

    Abstract: This paper serves as a reference and introduction to using the R package dapper. dapper encodes a sampling framework which allows exact Markov chain Monte Carlo simulation of parameters and latent variables in a statistical model given privatized data. The goal of this package is to fill an urgent need by providing applied researchers with a flexible tool to perform valid Bayesian inference on dat… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.

  5. arXiv:2410.17468  [pdf, other

    cs.CR stat.AP

    Formal Privacy Guarantees with Invariant Statistics

    Authors: Young Hyun Cho, Jordan Awan

    Abstract: Motivated by the 2020 US Census products, this paper extends differential privacy (DP) to address the joint release of DP outputs and nonprivate statistics, referred to as invariant. Our framework, Semi-DP, redefines adjacency by focusing on datasets that conform to the given invariant, ensuring indistinguishability between adjacent datasets within invariant-conforming datasets. We further develop… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

  6. arXiv:2410.14789  [pdf, ps, other

    stat.ME cs.CR cs.LG

    Differentially Private Covariate Balancing Causal Inference

    Authors: Yuki Ohnishi, Jordan Awan

    Abstract: Differential privacy is the leading mathematical framework for privacy protection, providing a probabilistic guarantee that safeguards individuals' private information when publishing statistics from a dataset. This guarantee is achieved by applying a randomized algorithm to the original data, which introduces unique challenges in data analysis by distorting inherent patterns. In particular, causa… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

    Comments: 30 pages

  7. arXiv:2409.04387  [pdf, ps, other

    stat.CO cs.CR stat.AP

    Best Linear Unbiased Estimate from Privatized Contingency Tables

    Authors: Jordan Awan, Adam Edwards, Paul Bartholomew, Andrew Sillers

    Abstract: In differential privacy (DP) mechanisms, it can be beneficial to release ``redundant'' outputs, where some quantities can be estimated in multiple ways by combining different privatized values. Indeed, the DP 2020 Decennial Census products published by the U.S. Census Bureau consist of such redundant noisy counts. When redundancy is present, the DP output can be improved by enforcing self-consiste… ▽ More

    Submitted 30 June, 2025; v1 submitted 6 September, 2024; originally announced September 2024.

    Comments: 25 pages before references and appendices, 41 pages total, 2 figures and 7 tables

    MSC Class: 62-08; 62P25; 68P27

  8. arXiv:2408.14441  [pdf, other

    cs.CV cs.AI

    Attend-Fusion: Efficient Audio-Visual Fusion for Video Classification

    Authors: Mahrukh Awan, Asmar Nadeem, Muhammad Junaid Awan, Armin Mustafa, Syed Sameed Husain

    Abstract: Exploiting both audio and visual modalities for video classification is a challenging task, as the existing methods require large model architectures, leading to high computational complexity and resource requirements. Smaller architectures, on the other hand, struggle to achieve optimal performance. In this paper, we propose Attend-Fusion, an audio-visual (AV) fusion approach that introduces a co… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  9. arXiv:2406.06231  [pdf, ps, other

    math.ST cs.CR stat.CO

    Statistical Inference for Privatized Data with Unknown Sample Size

    Authors: Jordan Awan, Andres Felipe Barrientos, Nianqiao Ju

    Abstract: We develop both theory and algorithms to analyze privatized data in the unbounded differential privacy(DP), where even the sample size is considered a sensitive quantity that requires privacy protection. We show that the distance between the sampling distributions under unbounded DP and bounded DP goes to zero as the sample size $n$ goes to infinity, provided that the noise used to privatize $n$ i… ▽ More

    Submitted 30 June, 2025; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: 20 pages before references, 42 pages in total, 4 figures, 4 tables

  10. arXiv:2308.08343  [pdf, other

    cs.CR math.PR math.ST

    Optimizing Noise for $f$-Differential Privacy via Anti-Concentration and Stochastic Dominance

    Authors: Jordan Awan, Aishwarya Ramasethu

    Abstract: In this paper, we establish anti-concentration inequalities for additive noise mechanisms which achieve $f$-differential privacy ($f$-DP), a notion of privacy phrased in terms of a tradeoff function $f$ which limits the ability of an adversary to determine which individuals were in the database. We show that canonical noise distributions (CNDs), proposed by Awan and Vadhan (2023), match the anti-c… ▽ More

    Submitted 11 November, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 20 pages before appendix, 32 pages total, 6 figures

    MSC Class: 68P27; 60E15

  11. arXiv:2305.03609  [pdf, other

    stat.ML cs.CG cs.CR cs.LG math.AT

    Differentially Private Topological Data Analysis

    Authors: Taegyu Kang, Sehwan Kim, Jinwon Sohn, Jordan Awan

    Abstract: This paper is the first to attempt differentially private (DP) topological data analysis (TDA), producing near-optimal private persistence diagrams. We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance, and we show that the commonly used Čech complex has sensitivity that does not decrease as the sample size $n$ increases. This makes it challenging for the persiste… ▽ More

    Submitted 3 November, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

    Comments: 23 pages before references and appendices, 42 pages total, 8 figures

  12. arXiv:2303.05328  [pdf, other

    math.ST cs.CR stat.ME

    Simulation-based, Finite-sample Inference for Privatized Data

    Authors: Jordan Awan, Zhanyu Wang

    Abstract: Privacy protection methods, such as differentially private mechanisms, introduce noise into resulting statistics which often produces complex and intractable sampling distributions. In this paper, we propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests, which builds on the work of Xie and Wang (2022). We show that this methodol… ▽ More

    Submitted 6 November, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: 25 pages before references and appendices, 42 pages total, 10 figures, 9 tables

  13. arXiv:2301.01616  [pdf, other

    stat.ME

    Locally Private Causal Inference for Randomized Experiments

    Authors: Yuki Ohnishi, Jordan Awan

    Abstract: Local differential privacy is a differential privacy paradigm in which individuals first apply a privacy mechanism to their data (often by adding noise) before transmitting the result to a curator. The noise for privacy results in additional bias and variance in their analyses. Thus it is of great importance for analysts to incorporate the privacy noise into valid inference. In this article, we de… ▽ More

    Submitted 14 October, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

    Comments: 36 pages

  14. arXiv:2210.06140  [pdf, other

    stat.ML cs.CR cs.DS cs.LG

    Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

    Authors: Zhanyu Wang, Guang Cheng, Jordan Awan

    Abstract: Differentially private (DP) mechanisms protect individual-level information by introducing randomness into the statistical analysis procedure. Despite the availability of numerous DP tools, there remains a lack of general techniques for conducting statistical inference under DP. We examine a DP bootstrap procedure that releases multiple private bootstrap estimates to infer the sampling distributio… ▽ More

    Submitted 2 October, 2024; v1 submitted 12 October, 2022; originally announced October 2022.

  15. arXiv:2208.06236  [pdf, other

    stat.ME cs.CR

    Differentially Private Kolmogorov-Smirnov-Type Tests

    Authors: Jordan Awan, Yue Wang

    Abstract: Hypothesis testing is a central problem in statistical analysis, and there is currently a lack of differentially private tests which are both statistically valid and powerful. In this paper, we develop several new differentially private (DP) nonparametric hypothesis tests. Our tests are based on Kolmogorov-Smirnov, Kuiper, Cramér-von Mises, and Wasserstein test statistics, which can all be express… ▽ More

    Submitted 30 October, 2023; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: 19 pages before appendix and references. 3 Figures

  16. arXiv:2206.04572  [pdf, other

    cs.CR math.ST

    Log-Concave and Multivariate Canonical Noise Distributions for Differential Privacy

    Authors: Jordan Awan, Jinshuo Dong

    Abstract: A canonical noise distribution (CND) is an additive mechanism designed to satisfy $f$-differential privacy ($f$-DP), without any wasted privacy budget. $f$-DP is a hypothesis testing-based formulation of privacy phrased in terms of tradeoff functions, which captures the difficulty of a hypothesis test. In this paper, we consider the existence and construction of both log-concave CNDs and multivari… ▽ More

    Submitted 5 October, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 10 pages before references, 1 Figure

  17. arXiv:2206.00710  [pdf, other

    stat.ME stat.CO

    Data Augmentation MCMC for Bayesian Inference from Privatized Data

    Authors: Nianqiao Ju, Jordan A. Awan, Ruobin Gong, Vinayak A. Rao

    Abstract: Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typical… ▽ More

    Submitted 7 December, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: 17 pages, 3 figures, 2 tables. NeurIPS 2022

  18. arXiv:2204.00162  [pdf, other

    math.CO

    Tutte polynomials for regular oriented matroids

    Authors: Jordan Awan, Olivier Bernardi

    Abstract: The Tutte polynomial is a fundamental invariant of graphs and matroids. In this article, we define a generalization of the Tutte polynomial to oriented graphs and regular oriented matroids. To any regular oriented matroid $N$, we associate a polynomial invariant $A_N(q,y,z)$, which we call the A-polynomial. The A-polynomial has the following interesting properties among many others: 1. a special… ▽ More

    Submitted 11 October, 2023; v1 submitted 31 March, 2022; originally announced April 2022.

  19. arXiv:2112.12836  [pdf, other

    cs.AR cs.DC

    Towards Hardware Support for FPGA Resource Elasticity

    Authors: Ahsan Javed Awan, Fidan Aliyeva

    Abstract: FPGAs are increasingly being deployed in the cloud to accelerate diverse applications. They are to be shared among multiple tenants to improve the total cost of ownership. Partial reconfiguration technology enables multi-tenancy on FPGA by partitioning it into regions, each hosting a specific application's accelerator. However, the region's size can not be changed once they are defined, resulting… ▽ More

    Submitted 4 July, 2022; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: Preprint of paper presented at Euromicro Conference on Digital System Design (DSD'22)

  20. arXiv:2108.04303  [pdf, other

    cs.CR math.ST

    Canonical Noise Distributions and Private Hypothesis Tests

    Authors: Jordan Awan, Salil Vadhan

    Abstract: $f$-DP has recently been proposed as a generalization of differential privacy allowing a lossless analysis of composition, post-processing, and privacy amplification via subsampling. In the setting of $f$-DP, we propose the concept of a canonical noise distribution (CND), the first mechanism designed for an arbitrary $f… ▽ More

    Submitted 13 January, 2023; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: 23 pages + references and appendix. 4 figures

  21. arXiv:2108.00965  [pdf, other

    cs.CR cs.CY stat.CO

    Privacy-Aware Rejection Sampling

    Authors: Jordan Awan, Vinayak Rao

    Abstract: Differential privacy (DP) offers strong theoretical privacy guarantees, but implementations of DP mechanisms may be vulnerable to side-channel attacks, such as timing attacks. When sampling methods such as MCMC or rejection sampling are used to implement a mechanism, the runtime can leak private information. We characterize the additional privacy cost due to the runtime of a rejection sampler in t… ▽ More

    Submitted 29 September, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

    Comments: 25 pages + references, 4 figures

  22. arXiv:2106.15284  [pdf, other

    cs.AR cs.PF

    NMPO: Near-Memory Computing Profiling and Offloading

    Authors: Stefano Corda, Madhurya Kumaraswamy, Ahsan Javed Awan, Roel Jordans, Akash Kumar, Henk Corporaal

    Abstract: Real-world applications are now processing big-data sets, often bottlenecked by the data movement between the compute units and the main memory. Near-memory computing (NMC), a modern data-centric computational paradigm, can alleviate these bottlenecks, thereby improving the performance of applications. The lack of NMC system availability makes simulators the primary evaluation tool for performance… ▽ More

    Submitted 29 June, 2021; originally announced June 2021.

    Comments: Euromicro Conference on Digital System Design 2021

  23. arXiv:2106.14141  [pdf, other

    math.CO

    Demicaps in AG(4,3) and Their Relation to Maximal Cap Partitions

    Authors: Jordan Awan, Clare Frechette, Yumi Li, Elizabeth McMahon

    Abstract: In this paper, we introduce a fundamental substructure of maximal caps in the affine geometry $AG(4,3)$ that we call \emph{demicaps}. Demicaps provide a direct link to particular partitions of $AG(4,3)$ into 4 maximal caps plus a single point. The full collection of 36 maximal caps that are in exactly one partition with a given cap $C$ can be expressed as unions of two disjoint demicaps taken from… ▽ More

    Submitted 27 June, 2021; originally announced June 2021.

    Comments: 19 pages, 16 figures

    MSC Class: 51E15; 51E22

  24. arXiv:2006.02397  [pdf, other

    math.ST cs.CR stat.CO

    One Step to Efficient Synthetic Data

    Authors: Jordan Awan, Zhanrui Cai

    Abstract: A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true distribution. Motivated by this, we propose a general method of producing synthetic data, which is widely applicable for parametric models, has asymptotically efficient… ▽ More

    Submitted 26 July, 2024; v1 submitted 3 June, 2020; originally announced June 2020.

    Comments: 30 pages before references and appendices

  25. arXiv:2005.04098  [pdf, other

    cs.DC

    Near Memory Acceleration on High Resolution Radio Astronomy Imaging

    Authors: Stefano Corda, Bram Veenboer, Ahsan Javed Awan, Akash Kumar, Roel Jordans, Henk Corporaal

    Abstract: Modern radio telescopes like the Square Kilometer Array (SKA) will need to process in real-time exabytes of radio-astronomical signals to construct a high-resolution map of the sky. Near-Memory Computing (NMC) could alleviate the performance bottlenecks due to frequent memory accesses in a state-of-the-art radio-astronomy imaging algorithm. In this paper, we show that a sub-module performing a two… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

  26. arXiv:1909.12183  [pdf, other

    cs.ET quant-ph

    K-Means Clustering on Noisy Intermediate Scale Quantum Computers

    Authors: Sumsam Ullah Khan, Ahsan Javed Awan, Gemma Vall-Llosera

    Abstract: Real-time clustering of big performance data generated by the telecommunication networks requires domain-specific high performance compute infrastructure to detect anomalies. In this paper, we evaluate noisy intermediate-scale quantum (NISQ) computers characterized by low decoherence times, for K-means clustering and propose three strategies to generate shorter-depth quantum circuits needed to ove… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

  27. arXiv:1909.11988  [pdf, other

    cs.ET cs.LG quant-ph

    Support Vector Machines on Noisy Intermediate Scale Quantum Computers

    Authors: Jiaying Yang, Ahsan Javed Awan, Gemma Vall-Llosera

    Abstract: Support vector machine algorithms are considered essential for the implementation of automation in a radio access network. Specifically, they are critical in the prediction of the quality of user experience for video streaming based on device and network-level metrics. Quantum SVM is the quantum analogue of the classical SVM algorithm, which utilizes the properties of quantum computers to speed up… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

  28. arXiv:1908.02640  [pdf, other

    cs.AR cs.DC cs.PF

    Near-Memory Computing: Past, Present, and Future

    Authors: Gagandeep Singh, Lorenzo Chelini, Stefano Corda, Ahsan Javed Awan, Sander Stuijk, Roel Jordans, Henk Corporaal, Albert-Jan Boonstra

    Abstract: The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration technologies has made the decade-old concept of coupling compute units close to the memory --- called near-memory computing (NMC) --- more viable. P… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: Preprint

  29. arXiv:1906.10037  [pdf, ps, other

    cs.PF cs.ET

    Platform Independent Software Analysis for Near Memory Computing

    Authors: Stefano Corda, Gagandeep Singh, Ahsan Javed Awan, Roel Jordans, Henk Corporaal

    Abstract: Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized tools are needed to identify them. In this paper, we present PISA-NMC, which extends a state-of-the-art hardware agnostic profiling tool with metrics concerning me… ▽ More

    Submitted 24 June, 2019; originally announced June 2019.

    Journal ref: Euromicro Conference on Digital System Design (DSD) 2019

  30. arXiv:1905.09436  [pdf, other

    cs.CR stat.ML

    KNG: The K-Norm Gradient Mechanism

    Authors: Matthew Reimherr, Jordan Awan

    Abstract: This paper presents a new mechanism for producing sanitized statistical summaries that achieve \emph{differential privacy}, called the \emph{K-Norm Gradient} Mechanism, or KNG. This new approach maintains the strong flexibility of the exponential mechanism, while achieving the powerful utility performance of objective perturbation. KNG starts with an inherent objective function (often an empirical… ▽ More

    Submitted 2 August, 2021; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: 14 pages, 2 figures, published in NeurIPS 33

  31. arXiv:1905.09420  [pdf, ps, other

    cs.CR math.ST

    Elliptical Perturbations for Differential Privacy

    Authors: Matthew Reimherr, Jordan Awan

    Abstract: We study elliptical distributions in locally convex vector spaces, and determine conditions when they can or cannot be used to satisfy differential privacy (DP). A requisite condition for a sanitized statistical summary to satisfy DP is that the corresponding privacy mechanism must induce equivalent measures for all possible input databases. We show that elliptical distributions with the same disp… ▽ More

    Submitted 5 May, 2021; v1 submitted 22 May, 2019; originally announced May 2019.

    Comments: 13 pages. Published in NeurIPS 2019 (https://proceedings.neurips.cc/paper/2019/hash/b3dd760eb02d2e669c604f6b2f1e803f-Abstract.html). This Arxiv document corrects a few minor errors in the published version

    Journal ref: NeurIPS 32 (2019)

  32. arXiv:1904.08762  [pdf, other

    cs.DC cs.AR cs.PF

    Memory and Parallelism Analysis Using a Platform-Independent Approach

    Authors: Stefano Corda, Gagandeep Singh, Ahsan Javed Awan, Roel Jordans, Henk Corporaal

    Abstract: Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, we extend the state-of-the-art platform-independent software analysis tool with NMC related metrics such as memory entropy, spatial locality, data-le… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: 22nd ACM International Workshop on Software and Compilers for Embedded Systems (SCOPES '19), May 2019

  33. arXiv:1904.00459  [pdf, other

    math.ST cs.CR

    Differentially Private Inference for Binomial Data

    Authors: Jordan Awan, Aleksandra Slavkovic

    Abstract: We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests can be written in terms of linear constraints, and for exchangeable data can always be expressed as a function of the empirical distribution. Using this str… ▽ More

    Submitted 31 March, 2019; originally announced April 2019.

    Comments: 25 pages before references; 39 pages total. 8 figures. arXiv admin note: text overlap with arXiv:1805.09236

  34. arXiv:1901.10864  [pdf, other

    cs.CR cs.LG stat.ML

    Benefits and Pitfalls of the Exponential Mechanism with Applications to Hilbert Spaces and Functional PCA

    Authors: Jordan Awan, Ana Kenney, Matthew Reimherr, Aleksandra Slavković

    Abstract: The exponential mechanism is a fundamental tool of Differential Privacy (DP) due to its strong privacy guarantees and flexibility. We study its extension to settings with summaries based on infinite dimensional outputs such as with functional data analysis, shape analysis, and nonparametric statistics. We show that one can design the mechanism with respect to a specific base measure over the outpu… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: 13 pages, 5 images, 2 tables

    MSC Class: 46E22; 46S50; 60G15; 62H25

  35. arXiv:1805.09236  [pdf, other

    math.ST

    Differentially Private Uniformly Most Powerful Tests for Binomial Data

    Authors: Jordan Awan, Aleksandra Slavkovic

    Abstract: We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a `Neyman-Pearson lemma' for binom… ▽ More

    Submitted 23 May, 2018; originally announced May 2018.

    Comments: 15 pages, 2 figures

  36. arXiv:1801.09236  [pdf, other

    stat.ME

    Structure and Sensitivity in Differential Privacy: Comparing K-Norm Mechanisms

    Authors: Jordan Awan, Aleksandra Slavkovic

    Abstract: Differential privacy (DP), provides a framework for provable privacy protection against arbitrary adversaries, while allowing the release of summary statistics and synthetic data. We address the problem of releasing a noisy real-valued statistic vector $T$, a function of sensitive data under DP, via the class of $K$-norm mechanisms with the goal of minimizing the noise added to achieve privacy. Fi… ▽ More

    Submitted 31 October, 2024; v1 submitted 28 January, 2018; originally announced January 2018.

    Comments: 40 pages, 6 figures, 1 table

    MSC Class: 62J05; 62J07; 62J12; 68W20

  37. arXiv:1707.09323  [pdf, other

    cs.DC

    Identifying the potential of Near Data Computing for Apache Spark

    Authors: Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov, Eduard Ayguade

    Abstract: While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. There is also a renewed interest is Near Data Computing (NDC) due to technological advancement in the last decade. However, it is not known if NDC… ▽ More

    Submitted 8 May, 2017; originally announced July 2017.

    Comments: position paper

  38. arXiv:1610.01839  [pdf, other

    math.CO

    Tutte Polynomials for Directed Graphs

    Authors: Jordan Awan, Olivier Bernardi

    Abstract: The Tutte polynomial is a fundamental invariant of graphs. In this article, we define and study a generalization of the Tutte polynomial for directed graphs, that we name B-polynomial. The B-polynomial has three variables, but when specialized to the case of graphs (that is, digraphs where arcs come in pairs with opposite directions), one of the variables becomes redundant and the B-polynomial is… ▽ More

    Submitted 29 December, 2018; v1 submitted 6 October, 2016; originally announced October 2016.

  39. arXiv:1604.08484  [pdf, other

    cs.DC cs.AR cs.PF

    Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study

    Authors: Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov, Eduard Ayguade

    Abstract: While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We com… ▽ More

    Submitted 28 April, 2016; originally announced April 2016.

  40. arXiv:1507.08340  [pdf, other

    cs.DC cs.AR cs.PF

    How Data Volume Affects Spark Based Data Analytics on a Scale-up Server

    Authors: Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov, Eduard Ayguade

    Abstract: Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not we… ▽ More

    Submitted 29 July, 2015; originally announced July 2015.

    Comments: accepted to 6th International Workshop on Big Data Benchmarks, Performance Optimization and Emerging Hardware (BpoE-6) held in conjunction with VLDB 2015. arXiv admin note: text overlap with arXiv:1506.07742

  41. Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server

    Authors: Ahsan Javed Awan, Mats Brorsson, Vladimir Vlassov, Eduard Ayguade

    Abstract: In last decade, data analytics have rapidly progressed from traditional disk-based processing to modern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scal… ▽ More

    Submitted 25 June, 2015; originally announced June 2015.

    Comments: Accepted to The 5th IEEE International Conference on Big Data and Cloud Computing (BDCloud 2015)