Skip to main content

Showing 1–6 of 6 results for author: Soori, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.14249  [pdf, other

    cs.LG cs.AI cs.CL

    Humanity's Last Exam

    Authors: Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, Michael Choi, Anish Agrawal, Arnav Chopra, Adam Khoja, Ryan Kim, Richard Ren, Jason Hausenloy, Oliver Zhang, Mantas Mazeika, Dmitry Dodonov, Tung Nguyen, Jaeho Lee, Daron Anderson, Mikhail Doroshenko, Alun Cennyth Stokes , et al. (1084 additional authors not shown)

    Abstract: Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of… ▽ More

    Submitted 19 April, 2025; v1 submitted 24 January, 2025; originally announced January 2025.

    Comments: 29 pages, 6 figures

  2. arXiv:2108.09365  [pdf, other

    math.OC cs.DC

    L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method

    Authors: Bugra Can, Saeed Soori, Maryam Mehri Dehnavi, Mert Gürbüzbalaban

    Abstract: This work proposes a distributed algorithm for solving empirical risk minimization problems, called L-DQN, under the master/worker communication model. L-DQN is a distributed limited-memory quasi-Newton method that supports asynchronous computations among the worker nodes. Our method is efficient both in terms of storage and communication costs, i.e., in every iteration the master node and workers… ▽ More

    Submitted 4 September, 2021; v1 submitted 20 August, 2021; originally announced August 2021.

    MSC Class: 68W15 (Primary)

  3. arXiv:2106.03947  [pdf, other

    cs.LG

    TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

    Authors: Saeed Soori, Bugra Can, Baourun Mu, Mert Gürbüzbalaban, Maryam Mehri Dehnavi

    Abstract: This work proposes a time-efficient Natural Gradient Descent method, called TENGraD, with linear convergence guarantees. Computing the inverse of the neural network's Fisher information matrix is expensive in NGD because the Fisher matrix is large. Approximate NGD methods such as KFAC attempt to improve NGD's running time and practical application by reducing the Fisher matrix inversion cost with… ▽ More

    Submitted 3 March, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  4. arXiv:1907.08526  [pdf, ps, other

    cs.DC cs.LG

    ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning

    Authors: Saeed Soori, Bugra Can, Mert Gurbuzbalaba, Maryam Mehri Dehnavi

    Abstract: ASYNC is a framework that supports the implementation of asynchrony and history for optimization methods on distributed computing platforms. The popularity of asynchronous optimization methods has increased in distributed machine learning. However, their applicability and practical experimentation on distributed systems are limited because current bulk-processing cloud engines do not provide a rob… ▽ More

    Submitted 20 February, 2020; v1 submitted 19 July, 2019; originally announced July 2019.

  5. arXiv:1812.07152  [pdf, other

    cs.DC

    MatRox: Modular approach for improving data locality in Hierarchical (Mat)rix App(Rox)imation

    Authors: Bangtian Liu, Kazem Cheshmi, Saeed Soori, Michelle Mills Strout, Maryam Mehri Dehnavi

    Abstract: Hierarchical matrix approximations have gained significant traction in the machine learning and scientific community as they exploit available low-rank structures in kernel methods to compress the kernel matrix. The resulting compressed matrix, HMatrix, is used to reduce the computational complexity of operations such as HMatrix-matrix multiplications with tuneable accuracy in an evaluation phase.… ▽ More

    Submitted 30 November, 2019; v1 submitted 17 December, 2018; originally announced December 2018.

  6. arXiv:1710.08883  [pdf, other

    cs.DC cs.LG math.NA math.OC

    Avoiding Communication in Proximal Methods for Convex Optimization Problems

    Authors: Saeed Soori, Aditya Devarakonda, James Demmel, Mert Gurbuzbalaban, Maryam Mehri Dehnavi

    Abstract: The fast iterative soft thresholding algorithm (FISTA) is used to solve convex regularized optimization problems in machine learning. Distributed implementations of the algorithm have become popular since they enable the analysis of large datasets. However, existing formulations of FISTA communicate data at every iteration which reduces its performance on modern distributed architectures. The comm… ▽ More

    Submitted 24 October, 2017; originally announced October 2017.