Skip to main content

Showing 1–7 of 7 results for author: Sarwar, Z

.
  1. arXiv:2504.12463  [pdf, other

    cs.LG cs.AI

    Dense Backpropagation Improves Training for Sparse Mixture-of-Experts

    Authors: Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Therien, Supriyo Chakraborty, Tom Goldstein

    Abstract: Mixture of Experts (MoE) pretraining is more scalable than dense Transformer pretraining, because MoEs learn to route inputs to a sparse set of their feedforward parameters. However, this means that MoEs only receive a sparse backward update, leading to training instability and suboptimal performance. We present a lightweight approximation method that gives the MoE router a dense gradient update w… ▽ More

    Submitted 17 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

  2. arXiv:2503.05029  [pdf, other

    cs.LG cs.AI cs.CL

    Continual Pre-training of MoEs: How robust is your router?

    Authors: Benjamin Thérien, Charles-Étienne Joseph, Zain Sarwar, Ashwinee Panda, Anirban Das, Shi-Xiong Zhang, Stephen Rawls, Sambit Sahu, Eugene Belilovsky, Irina Rish

    Abstract: Sparsely-activated Mixture of Experts (MoE) transformers are promising architectures for foundation models. Compared to dense transformers that require the same amount of floating point operations (FLOPs) per forward pass, MoEs benefit from improved sample efficiency at training time and achieve much stronger performance. Many closed-source and open-source frontier language models have thus adopte… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

  3. arXiv:2410.08432  [pdf, other

    cs.LG

    MYCROFT: Towards Effective and Efficient External Data Augmentation

    Authors: Zain Sarwar, Van Tran, Arjun Nitin Bhagoji, Nick Feamster, Ben Y. Zhao, Supriyo Chakraborty

    Abstract: Machine learning (ML) models often require large amounts of data to perform well. When the available data is limited, model trainers may need to acquire more data from external sources. Often, useful data is held by private entities who are hesitant to share their data due to propriety and privacy concerns. This makes it challenging and expensive for model trainers to acquire the data they need to… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 10 pages, 3 figures, 3 tables

  4. arXiv:2310.16191  [pdf, other

    cs.CR

    Can Virtual Reality Protect Users from Keystroke Inference Attacks?

    Authors: Zhuolin Yang, Zain Sarwar, Iris Hwang, Ronik Bhaskar, Ben Y. Zhao, Haitao Zheng

    Abstract: Virtual Reality (VR) has gained popularity by providing immersive and interactive experiences without geographical limitations. It also provides a sense of personal privacy through physical separation. In this paper, we show that despite assumptions of enhanced privacy, VR is unable to shield its users from side-channel attacks that steal private information. Ironically, this vulnerability arises… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted by USENIX 2024

  5. arXiv:2210.09421  [pdf, other

    cs.CR cs.CL cs.LG

    Deepfake Text Detection: Limitations and Opportunities

    Authors: Jiameng Pu, Zain Sarwar, Sifat Muhammad Abdullah, Abdullah Rehman, Yoonjin Kim, Parantapa Bhattacharya, Mobin Javed, Bimal Viswanath

    Abstract: Recent advances in generative models for language have enabled the creation of convincing synthetic text or deepfake text. Prior work has demonstrated the potential for misuse of deepfake text to mislead content consumers. Therefore, deepfake text detection, the task of discriminating between human and machine-generated text, is becoming increasingly critical. Several defenses have been proposed f… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted to IEEE S&P 2023; First two authors contributed equally to this work; 18 pages, 7 figures

  6. Thermodynamics of Bardeen regular black hole with generalized uncertainty principle

    Authors: Areeba Merriam, M. Zain Sarwar

    Abstract: This study explores the emission of massive charged spin-1 particles from the background of Bardeen regular spacetime by the semi-classical method used to study the Hawking radiation spectrum. We employed the Hamilton-Jacobi method and WKB approximation technique with the suitable form of the wave function to solve the Proca field equation. We calculated the tunneling probability of outgoing spin-… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Comments: 13 pages, 4 figures

  7. arXiv:1910.07718  [pdf

    eess.SP

    Multimetric Event-driven System for Long-Term Wireless Sensor Operation in SHM Application

    Authors: Muhammad Zohaib Sarwar, Muhammad Rakeh Saleem, Jong-Woong Park, Do-Soo Moon, Dong Joo Kim

    Abstract: Wireless sensor networks (WSNs) are promising solutions for large infrastructure monitoring because of their ease of installation, computing and communication capability, and cost-effectiveness. Long-term structural health monitoring (SHM), however, is still a challenge because it requires continuous data acquisition for the detection of random events such as earthquakes and structural collapse. T… ▽ More

    Submitted 17 October, 2019; originally announced October 2019.

    Comments: 10 pages, 9 figures, 3 Tables, Journal paper