Skip to main content

Showing 1–7 of 7 results for author: Soran, B

.
  1. arXiv:2502.02631  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

    Authors: Zechun Liu, Changsheng Zhao, Hanxian Huang, Sijia Chen, Jing Zhang, Jiawei Zhao, Scott Roy, Lisa Jin, Yunyang Xiong, Yangyang Shi, Lin Xiao, Yuandong Tian, Bilge Soran, Raghuraman Krishnamoorthi, Tijmen Blankevoort, Vikas Chandra

    Abstract: The optimal bit-width for achieving the best trade-off between quantized model size and accuracy has been a subject of ongoing debate. While some advocate for 4-bit quantization, others propose that 1.58-bit offers superior results. However, the lack of a cohesive framework for different bits has left such conclusions relatively tenuous. We present ParetoQ, the first unified framework that facilit… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  2. arXiv:2501.07256  [pdf, other

    cs.CV

    EdgeTAM: On-Device Track Anything Model

    Authors: Chong Zhou, Chenchen Zhu, Yunyang Xiong, Saksham Suri, Fanyi Xiao, Lemeng Wu, Raghuraman Krishnamoorthi, Bo Dai, Chen Change Loy, Vikas Chandra, Bilge Soran

    Abstract: On top of Segment Anything Model (SAM), SAM 2 further extends its capability from image to video inputs through a memory bank mechanism and obtains a remarkable performance compared with previous methods, making it a foundation model for video segmentation task. In this paper, we aim at making SAM 2 much more efficient so that it even runs on mobile devices while maintaining a comparable performan… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

    Comments: Code will be released at https://github.com/facebookresearch/EdgeTAM

  3. arXiv:2411.18933  [pdf, other

    cs.CV

    Efficient Track Anything

    Authors: Yunyang Xiong, Chong Zhou, Xiaoyu Xiang, Lemeng Wu, Chenchen Zhu, Zechun Liu, Saksham Suri, Balakrishnan Varadarajan, Ramya Akula, Forrest Iandola, Raghuraman Krishnamoorthi, Bilge Soran, Vikas Chandra

    Abstract: Segment Anything Model 2 (SAM 2) has emerged as a powerful tool for video object segmentation and tracking anything. Key components of SAM 2 that drive the impressive video object segmentation performance include a large multistage image encoder for frame feature extraction and a memory mechanism that stores memory contexts from past frames to help current frame segmentation. The high computation… ▽ More

    Submitted 28 November, 2024; originally announced November 2024.

  4. arXiv:2411.17713  [pdf, other

    cs.DC cs.AI

    Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations

    Authors: Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel, Zechun Liu, Changsheng Zhao, Yangyang Shi, Tijmen Blankevoort, Mahesh Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024. We demonstrate that Llama Guard 3-1B-INT4 can be deployed on resource-constrained devices, achieving a throughput of at least 30 tokens per second and a time-to-first-token of 2.5 seconds or less on a commodity Android mobile CPU. Notably, our… ▽ More

    Submitted 18 November, 2024; originally announced November 2024.

  5. arXiv:2410.17434  [pdf, other

    cs.CV

    LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

    Authors: Xiaoqian Shen, Yunyang Xiong, Changsheng Zhao, Lemeng Wu, Jun Chen, Chenchen Zhu, Zechun Liu, Fanyi Xiao, Balakrishnan Varadarajan, Florian Bordes, Zhuang Liu, Hu Xu, Hyunwoo J. Kim, Bilge Soran, Raghuraman Krishnamoorthi, Mohamed Elhoseiny, Vikas Chandra

    Abstract: Multimodal Large Language Models (MLLMs) have shown promising progress in understanding and analyzing video content. However, processing long videos remains a significant challenge constrained by LLM's context size. To address this limitation, we propose LongVU, a spatiotemporal adaptive compression mechanism thats reduces the number of video tokens while preserving visual details of long videos.… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: Project page: https://vision-cair.github.io/LongVU

  6. arXiv:2405.16406  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    SpinQuant: LLM quantization with learned rotations

    Authors: Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, Tijmen Blankevoort

    Abstract: Post-training quantization (PTQ) techniques applied to weights, activations, and the KV cache greatly reduce memory usage, latency, and power consumption of Large Language Models (LLMs), but may lead to large quantization errors when outliers are present. Rotating activation or weight matrices helps remove outliers and benefits quantization. In this work, we identify a collection of applicable rot… ▽ More

    Submitted 20 February, 2025; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: ICLR 2025

  7. arXiv:2312.06736  [pdf, other

    cs.CV

    SqueezeSAM: User friendly mobile interactive segmentation

    Authors: Balakrishnan Varadarajan, Bilge Soran, Forrest Iandola, Xiaoyu Xiang, Yunyang Xiong, Lemeng Wu, Chenchen Zhu, Raghuraman Krishnamoorthi, Vikas Chandra

    Abstract: The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware… ▽ More

    Submitted 20 May, 2024; v1 submitted 11 December, 2023; originally announced December 2023.