Skip to main content

Showing 1–24 of 24 results for author: Mustafa, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.16584  [pdf, other

    cs.CR cs.AI

    Case Study: Fine-tuning Small Language Models for Accurate and Private CWE Detection in Python Code

    Authors: Md. Azizul Hakim Bappy, Hossen A Mustafa, Prottoy Saha, Rajinus Salehat

    Abstract: Large Language Models (LLMs) have demonstrated significant capabilities in understanding and analyzing code for security vulnerabilities, such as Common Weakness Enumerations (CWEs). However, their reliance on cloud infrastructure and substantial computational requirements pose challenges for analyzing sensitive or proprietary codebases due to privacy concerns and inference costs. This work explor… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 11 pages, 2 figures, 3 tables. Dataset available at https://huggingface.co/datasets/floxihunter/synthetic_python_cwe. Model available at https://huggingface.co/floxihunter/codegen-mono-CWEdetect. Keywords: Small Language Models (SLMs), Vulnerability Detection, CWE, Fine-tuning, Python Security, Privacy-Preserving Code Analysis

  2. arXiv:2504.03732  [pdf, other

    cs.AR cs.DC q-bio.GN

    SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Analysis

    Authors: Nika Mansouri Ghiasi, Talu Güloglu, Harun Mustafa, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu

    Abstract: Given the exponentially growing volumes of genomic data, there are extensive efforts to accelerate genome analysis. We demonstrate a major bottleneck that greatly limits and diminishes the benefits of state-of-the-art genome analysis accelerators: the data preparation bottleneck, where genomic data is stored in compressed form and needs to be decompressed and formatted first before an accelerator… ▽ More

    Submitted 21 April, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

  3. arXiv:2406.19113  [pdf, other

    cs.AR cs.DC q-bio.GN

    MegIS: High-Performance, Energy-Efficient, and Low-Cost Metagenomic Analysis with In-Storage Processing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Mao, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advances in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases. Metagenomic analysis suffers from significant data movement overhead due to moving large amounts of low-reuse data from the storage system. In-storag… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: To appear in ISCA 2024. arXiv admin note: substantial text overlap with arXiv:2311.12527

  4. arXiv:2311.12527  [pdf, other

    cs.AR q-bio.GN q-bio.QM

    MetaStore: High-Performance Metagenomic Analysis via In-Storage Computing

    Authors: Nika Mansouri Ghiasi, Mohammad Sadrosadati, Harun Mustafa, Arvid Gollwitzer, Can Firtina, Julien Eudine, Haiyu Ma, Joël Lindegger, Meryem Banu Cavlak, Mohammed Alser, Jisung Park, Onur Mutlu

    Abstract: Metagenomics has led to significant advancements in many fields. Metagenomic analysis commonly involves the key tasks of determining the species present in a sample and their relative abundances. These tasks require searching large metagenomic databases containing information on different species' genomes. Metagenomic analysis suffers from significant data movement overhead due to moving large amo… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  5. arXiv:2211.07854  [pdf, other

    quant-ph cs.LG

    Variational Quantum Algorithms for Chemical Simulation and Drug Discovery

    Authors: Hasan Mustafa, Sai Nandan Morapakula, Prateek Jain, Srinjoy Ganguly

    Abstract: Quantum computing has gained a lot of attention recently, and scientists have seen potential applications in this field using quantum computing for Cryptography and Communication to Machine Learning and Healthcare. Protein folding has been one of the most interesting areas to study, and it is also one of the biggest problems of biochemistry. Each protein folds distinctively, and the difficulty of… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  6. arXiv:2209.01147  [pdf, other

    cs.DS cs.CG cs.LG stat.ML

    Algorithms for Discrepancy, Matchings, and Approximations: Fast, Simple, and Practical

    Authors: Mónika Csikós, Nabil H. Mustafa

    Abstract: We study one of the key tools in data approximation and optimization: low-discrepancy colorings. Formally, given a finite set system $(X,\mathcal S)$, the \emph{discrepancy} of a two-coloring $χ:X\to\{-1,1\}$ is defined as $\max_{S \in \mathcal S}|{χ(S)}|$, where $χ(S)=\sum\limits_{x \in S}χ(x)$. We propose a randomized algorithm which, for any $d>0$ and $(X,\mathcal S)$ with dual shatter functi… ▽ More

    Submitted 2 September, 2022; originally announced September 2022.

  7. arXiv:2202.10400  [pdf, other

    cs.AR cs.DC cs.OS q-bio.GN

    GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

    Authors: Nika Mansouri Ghiasi, Jisung Park, Harun Mustafa, Jeremie Kim, Ataberk Olgun, Arvid Gollwitzer, Damla Senol Cali, Can Firtina, Haiyu Mao, Nour Almadhoun Alserr, Rachata Ausavarungnirun, Nandita Vijaykumar, Mohammed Alser, Onur Mutlu

    Abstract: Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome (called a reference genome). To address the computational challenges in genome analysis, many prior works propose various approaches such as filters that select th… ▽ More

    Submitted 6 April, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

    Comments: Published at ASPLOS 2022

  8. arXiv:2008.08970  [pdf, ps, other

    cs.LG cs.CG math.CO stat.ML

    Optimal Approximations Made Easy

    Authors: Mónika Csikós, Nabil H. Mustafa

    Abstract: The fundamental result of Li, Long, and Srinivasan on approximations of set systems has become a key tool across several communities such as learning theory, algorithms, computational geometry, combinatorics and data analysis. The goal of this paper is to give a modular, self-contained, intuitive proof of this result for finite set systems. The only ingredient we assume is the standard Chernoff'… ▽ More

    Submitted 1 September, 2022; v1 submitted 20 August, 2020; originally announced August 2020.

    Journal ref: Published in Information Processing Letters, Volume 176, June 2022, 106250

  9. arXiv:1911.04200  [pdf, other

    cs.CE cs.DC cs.PF q-bio.GN

    Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

    Authors: Maciej Besta, Raghavendra Kanakagiri, Harun Mustafa, Mikhail Karasikov, Gunnar Rätsch, Torsten Hoefler, Edgar Solomonik

    Abstract: The Jaccard similarity index is an important measure of the overlap of two sets, widely used in machine learning, computational genomics, information retrieval, and many other areas. We design and implement SimilarityAtScale, the first communication-efficient distributed algorithm for computing the Jaccard similarity among pairs of large datasets. Our algorithm provides an efficient encoding of th… ▽ More

    Submitted 11 November, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

    Journal ref: Proceedings of the 34st IEEE International Parallel and Distributed Processing Symposium (IPDPS'20), 2020

  10. arXiv:1909.13146  [pdf, other

    q-bio.GN cs.LG stat.ML

    META$^\mathbf{2}$: Memory-efficient taxonomic classification and abundance estimation for metagenomics with deep learning

    Authors: Andreas Georgiou, Vincent Fortuin, Harun Mustafa, Gunnar Rätsch

    Abstract: Metagenomic studies have increasingly utilized sequencing technologies in order to analyze DNA fragments found in environmental samples.One important step in this analysis is the taxonomic classification of the DNA fragments. Conventional read classification methods require large databases and vast amounts of memory to run, with recent deep learning methods suffering from very large model sizes. W… ▽ More

    Submitted 10 February, 2020; v1 submitted 28 September, 2019; originally announced September 2019.

  11. Novel Artificial Human Optimization Field Algorithms - The Beginning

    Authors: Satish Gajawada, Hassan Mustafa

    Abstract: New Artificial Human Optimization (AHO) Field Algorithms can be created from scratch or by adding the concept of Artificial Humans into other existing Optimization Algorithms. Particle Swarm Optimization (PSO) has been very popular for solving complex optimization problems due to its simplicity. In this work, new Artificial Human Optimization Field Algorithms are created by modifying existing PSO… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

    Comments: 25 pages, 41 figures

    Journal ref: Transactions on Machine Learning and Artificial Intelligence (TMLAI), Volume 7, Issue 1, February 2019

  12. arXiv:1807.07924  [pdf, ps, other

    cs.LG cs.CG stat.ML

    Optimal Bounds on the VC-dimension

    Authors: Monika Csikos, Andrey Kupavskii, Nabil H. Mustafa

    Abstract: The VC-dimension of a set system is a way to capture its complexity and has been a key parameter studied extensively in machine learning and geometry communities. In this paper, we resolve two longstanding open problems on bounding the VC-dimension of two fundamental set systems: $k$-fold unions/intersections of half-spaces, and the simplices set system. Among other implications, it settles an ope… ▽ More

    Submitted 20 July, 2018; originally announced July 2018.

  13. arXiv:1806.08725  [pdf, other

    math.MG cs.CG math.CO math.PR

    Theorems of Carathéodory, Helly, and Tverberg without dimension

    Authors: Karim Adiprasito, Imre Bárány, Nabil H. Mustafa, Tamás Terpai

    Abstract: We prove a no-dimensional version of Carathédory's theorem: given an $n$-element set $P\subset \Re^d$, a point $a \in \conv P$, and an integer $r\le d$, $r \le n$, there is a subset $Q\subset P$ of $r$ elements such that the distance between $a$ and $\conv Q$ is less than $\diam P/\sqrt {2r}$. A general no-dimension Helly type result is also proved with colourful and fractional consequences. Simil… ▽ More

    Submitted 28 August, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

    Comments: 23 pages, 1 figure

  14. arXiv:1711.01198  [pdf

    cs.CR

    Design and Analysis of a Secure Three Factor User Authentication Scheme Using Biometric and Smart Card

    Authors: Hossen Asiful Mustafa, Hasan Muhammad Kafi

    Abstract: Password security can no longer provide enough security in the area of remote user authentication. Considering this security drawback, researchers are trying to find solution with multifactor remote user authentication system. Recently, three factor remote user authentication using biometric and smart card has drawn a considerable attention of the researchers. However, most of the current proposed… ▽ More

    Submitted 3 November, 2017; originally announced November 2017.

    Comments: 12 pages, 6 figures, 2 tables

    Journal ref: International Journal of Computer Science and Information Security (IJCSIS), Vol. 15, No. 6, June 2017

  15. arXiv:1708.01590  [pdf, other

    math.CO cs.DM math.MG

    Bounding the size of an almost-equidistant set in Euclidean space

    Authors: Andrey Kupavskii, Nabil H. Mustafa, Konrad J. Swanepoel

    Abstract: A set of points in d-dimensional Euclidean space is almost equidistant if among any three points of the set, some two are at distance 1. We show that an almost-equidistant set in $\mathbb{R}^d$ has cardinality $O(d^{4/3})$.

    Submitted 4 August, 2017; originally announced August 2017.

    Comments: 6 pages

    MSC Class: 52C10

    Journal ref: Combinator. Probab. Comp. 28 (2019) 280-286

  16. arXiv:1702.03676  [pdf, ps, other

    cs.CG math.CO math.PR

    Epsilon-approximations and epsilon-nets

    Authors: Nabil H. Mustafa, Kasturi R. Varadarajan

    Abstract: The use of random samples to approximate properties of geometric configurations has been an influential idea for both combinatorial and algorithmic purposes. This chapter considers two related notions---$ε$-approximations and $ε$-nets---that capture the most important quantitative properties that one would expect from a random sample with respect to an underlying geometric configuration.

    Submitted 8 August, 2017; v1 submitted 13 February, 2017; originally announced February 2017.

    Comments: Chapter 47 in Handbook on Discrete and Computational Geometry, 3rd edition. 27 pages

  17. arXiv:1606.03668  [pdf, other

    cs.IT

    Spatial and Social Paradigms for Interference and Coverage Analysis in Underlay D2D Network

    Authors: Hafiz Attaul Mustafa, Muhammad Zeeshan Shakir, Muhammad Ali Imran, Rahim Tafazolli

    Abstract: The homogeneous Poisson point process (PPP) is widely used to model spatial distribution of base stations and mobile terminals. The same process can be used to model underlay device-to-device (D2D) network, however, neglecting homophilic relation for D2D pairing presents underestimated system insights. In this paper, we model both spatial and social distributions of interfering D2D nodes as proxim… ▽ More

    Submitted 28 April, 2017; v1 submitted 12 June, 2016; originally announced June 2016.

    Comments: 10 pages, 10 figures

  18. Separation Framework: An Enabler for Cooperative and D2D Communication for Future 5G Networks

    Authors: Hafiz Attaul Mustafa, Muhammad Ali Imran, Muhammad Zeeshan Shakir, Ali Imran, Rahim Tafazolli

    Abstract: Soaring capacity and coverage demands dictate that future cellular networks need to soon migrate towards ultra-dense networks. However, network densification comes with a host of challenges that include compromised energy efficiency, complex interference management, cumbersome mobility management, burdensome signaling overheads and higher backhaul costs. Interestingly, most of the problems, that b… ▽ More

    Submitted 10 April, 2016; originally announced April 2016.

    Comments: 28 pages, 11 figures, IEEE Communications Surveys & Tutorials 2015

  19. Coverage gain and Device-to-Device user Density: Stochastic Geometry Modeling and Analysis

    Authors: Hafiz Attaul Mustafa, Muhammad Zeeshan Shakir, Muhammad Ali Imran, Ali Imran, Rahim Tafazolli

    Abstract: Device-to-device (D2D) communication has huge potential for capacity and coverage enhancements for next generation cellular networks. The number of potential nodes for D2D communication is an important parameter that directly impacts the system capacity. In this paper, we derive analytic expression for average coverage probability of cellular user and corresponding number of potential D2D users. I… ▽ More

    Submitted 5 March, 2016; originally announced March 2016.

    Comments: 4 pages, 5 figures

    Journal ref: IEEE Comml, Volume:19, Issue:10, pp. 1742-1745, 2015

  20. arXiv:1603.01694  [pdf, ps, other

    cs.IT

    Intracell Interference Characterization and Cluster Inference for D2D Communication

    Authors: Hafiz Attaul Mustafa, Muhammad Zeeshan Shakir, Ali Riza Ekti, Muhammad Ali Imran, Rahim Tafazolli

    Abstract: The homogeneous poisson point process (PPP) is widely used to model temporal, spatial or both topologies of base stations (BSs) and mobile terminals (MTs). However, negative spatial correlation in BSs, due to strategical deployments, and positive spatial correlations in MTs, due to homophilic relations, cannot be captured by homogeneous spatial PPP (SPPP). In this paper, we assume doubly stochasti… ▽ More

    Submitted 5 March, 2016; originally announced March 2016.

    Comments: 11 pages, 14 figures

  21. arXiv:1509.04020  [pdf, ps, other

    cs.CG

    A Note on the Size-Sensitive Packing Lemma

    Authors: Nabil H. Mustafa

    Abstract: We show that the size-sensitive packing lemma follows from a simple modification of the standard proof, due to Haussler and simplified by Chazelle, of the packing lemma.

    Submitted 15 September, 2015; v1 submitted 14 September, 2015; originally announced September 2015.

    Comments: Modified title of the paper. 2 pages

  22. arXiv:1501.03246  [pdf, other

    cs.CG

    Tighter Estimates for epsilon-nets for Disks

    Authors: Norbert Bus, Shashwat Garg, Nabil H. Mustafa, Saurabh Ray

    Abstract: The geometric hitting set problem is one of the basic geometric combinatorial optimization problems: given a set $P$ of points, and a set $\mathcal{D}$ of geometric objects in the plane, the goal is to compute a small-sized subset of $P$ that hits all objects in $\mathcal{D}$. In 1994, Bronniman and Goodrich made an important connection of this problem to the size of fundamental combinatorial stru… ▽ More

    Submitted 13 January, 2015; originally announced January 2015.

  23. arXiv:1403.0835  [pdf, other

    cs.CG

    QPTAS for Geometric Set-Cover Problems via Optimal Separators

    Authors: Nabil H. Mustafa, Rajiv Raman, Saurabh Ray

    Abstract: Weighted geometric set-cover problems arise naturally in several geometric and non-geometric settings (e.g. the breakthrough of Bansal-Pruhs (FOCS 2010) reduces a wide class of machine scheduling problems to weighted geometric set-cover). More than two decades of research has succeeded in settling the $(1+ε)$-approximability status for most geometric set-cover problems, except for four basic scena… ▽ More

    Submitted 5 April, 2014; v1 submitted 4 March, 2014; originally announced March 2014.

    Comments: 26 pages. Revised to include an additional set-cover QPTAS for halfspaces

  24. arXiv:1002.4831  [pdf

    cs.NE

    On Analysis and Evaluation of Multi-Sensory Cognitive Learning of a Mathematical Topic Using Artificial Neural Networks

    Authors: F. A. Al-Zahrani, H. M. Mustafa, A. Al-Hamadi

    Abstract: This piece of research belongs to the field of educational assessment issue based upon the cognitive multimedia theory. Considering that theory; visual and auditory material should be presented simultaneously to reinforce the retention of a mathematical learned topic, a carefully computer-assisted learning (CAL) module is designed for development of a multimedia tutorial for our suggested mathem… ▽ More

    Submitted 25 February, 2010; originally announced February 2010.

    Comments: Journal of Telecommunications,Volume 1, Issue 1, pp99-104, February 2010