Skip to main content

Showing 1–4 of 4 results for author: Yassin, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.18051  [pdf, other

    cs.CV

    LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

    Authors: Anthony Fuller, Yousef Yassin, Junfeng Wen, Daniel G. Kyrollos, Tarek Ibrahim, James R. Green, Evan Shelhamer

    Abstract: Vision transformers are ever larger, more accurate, and more expensive to compute. The expense is even more extreme at high resolution as the number of tokens grows quadratically with the image size. We turn to adaptive computation to cope with this cost by learning to predict where to compute. Our LookWhere method divides the computation between a low-resolution selector and a high-resolution ext… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  2. arXiv:2505.17443  [pdf, ps, other

    cs.DS cs.LG

    Corporate Needs You to Find the Difference: Revisiting Submodular and Supermodular Ratio Optimization Problems

    Authors: Elfarouk Harb, Yousef Yassin, Chandra Chekuri

    Abstract: We study the problem of minimizing or maximizing the average value $ f(S)/|S| $ of a submodular or supermodular set function $ f: 2^V \to \mathbb{R} $ over non-empty subsets $ S \subseteq V $. This generalizes classical problems such as Densest Subgraph (DSG), Densest Supermodular Set (DSS), and Submodular Function Minimization (SFM). Motivated by recent applications, we introduce two broad formul… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  3. arXiv:2502.15021  [pdf, other

    cs.CV

    Simpler Fast Vision Transformers with a Jumbo CLS Token

    Authors: Anthony Fuller, Yousef Yassin, Daniel G. Kyrollos, Evan Shelhamer, James R. Green

    Abstract: We introduce a simple enhancement of vision transformers (ViTs) to improve accuracy while maintaining throughput. Our approach, Jumbo, creates a wider CLS token, which is split to match the patch token width before attention, processed with self-attention, and reassembled. After attention, Jumbo applies a dedicated, wider FFN to this token. Since there is only one Jumbo token, its cost is minimal,… ▽ More

    Submitted 23 May, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  4. arXiv:2405.13985  [pdf, other

    cs.CV

    LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate

    Authors: Anthony Fuller, Daniel G. Kyrollos, Yousef Yassin, James R. Green

    Abstract: High-resolution images offer more information about scenes that can improve model accuracy. However, the dominant model architecture in computer vision, the vision transformer (ViT), cannot effectively leverage larger images without finetuning -- ViTs poorly extrapolate to more patches at test time, although transformers offer sequence length flexibility. We attribute this shortcoming to the curre… ▽ More

    Submitted 29 October, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: NeurIPS 2024 Camera Ready