Skip to main content

Showing 1–3 of 3 results for author: Ocejo, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2502.03417  [pdf, other

    cs.LG

    From Features to Transformers: Redefining Ranking for Scalable Impact

    Authors: Fedor Borisyuk, Lars Hertel, Ganesh Parameswaran, Gaurav Srivastava, Sudarshan Srinivasa Ramanujam, Borja Ocejo, Peng Du, Andrei Akterskii, Neil Daftary, Shao Tang, Daqi Sun, Qiang Charles Xiao, Deepesh Nathani, Mohit Kothari, Yun Dai, Aman Gupta

    Abstract: We present LiGR, a large-scale ranking framework developed at LinkedIn that brings state-of-the-art transformer-based modeling architectures into production. We introduce a modified transformer architecture that incorporates learned normalization and simultaneous set-wise attention to user history and ranked items. This architecture enables several breakthrough achievements, including: (1) the dep… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  2. arXiv:2401.12332  [pdf, other

    cs.LG math.OC

    A Precise Characterization of SGD Stability Using Loss Surface Geometry

    Authors: Gregory Dexter, Borja Ocejo, Sathiya Keerthi, Aman Gupta, Ayan Acharya, Rajiv Khanna

    Abstract: Stochastic Gradient Descent (SGD) stands as a cornerstone optimization algorithm with proven real-world empirical successes but relatively limited theoretical understanding. Recent research has illuminated a key factor contributing to its practical efficacy: the implicit regularization it instigates. Several studies have investigated the linear stability property of SGD in the vicinity of a statio… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: To appear at ICLR 2024

  3. arXiv:2302.09693  [pdf, other

    stat.ML cs.LG

    mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

    Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder

    Abstract: Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as… ▽ More

    Submitted 30 September, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2212.04343