Skip to main content

Showing 1–12 of 12 results for author: Rosenberg, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.02392  [pdf, other

    cs.LG stat.ML

    Building Math Agents with Multi-Turn Iterative Preference Learning

    Authors: Wei Xiong, Chengshuai Shi, Jiaming Shen, Aviv Rosenberg, Zhen Qin, Daniele Calandriello, Misha Khalman, Rishabh Joshi, Bilal Piot, Mohammad Saleh, Chi Jin, Tong Zhang, Tianqi Liu

    Abstract: Recent studies have shown that large language models' (LLMs) mathematical problem-solving capabilities can be enhanced by integrating external tools, such as code interpreters, and employing multi-turn Chain-of-Thought (CoT) reasoning. While current methods focus on synthetic data generation and Supervised Fine-Tuning (SFT), this paper studies the complementary direct preference learning approach… ▽ More

    Submitted 27 February, 2025; v1 submitted 3 September, 2024; originally announced September 2024.

    Comments: A multi-turn direct preference learning framework for tool-integrated reasoning tasks

  2. arXiv:2407.03065  [pdf, ps, other

    cs.LG stat.ML

    Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes

    Authors: Asaf Cassel, Aviv Rosenberg

    Abstract: Policy Optimization (PO) methods are among the most popular Reinforcement Learning (RL) algorithms in practice. Recently, Sherman et al. [2023a] proposed a PO-based algorithm with rate-optimal regret guarantees under the linear Markov Decision Process (MDP) model. However, their algorithm relies on a costly pure exploration warm-up phase that is hard to implement in practice. This paper eliminates… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2307.01037  [pdf, other

    stat.ME cs.LG

    Vector Quantile Regression on Manifolds

    Authors: Marco Pegoraro, Sanketh Vedula, Aviv A. Rosenberg, Irene Tallini, Emanuele Rodolà, Alex M. Bronstein

    Abstract: Quantile regression (QR) is a statistical tool for distribution-free estimation of conditional quantiles of a target variable given explanatory features. QR is limited by the assumption that the target distribution is univariate and defined on an Euclidean domain. Although the notion of quantiles was recently extended to multi-variate distributions, QR for multi-variate distributions on manifolds… ▽ More

    Submitted 7 February, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

  4. arXiv:2208.01220  [pdf, other

    stat.ML cs.LG eess.SP

    GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction

    Authors: Jiacheng Zhu, Jielin Qiu, Zhuolin Yang, Douglas Weber, Michael A. Rosenberg, Emerson Liu, Bo Li, Ding Zhao

    Abstract: There has been an increased interest in applying deep neural networks to automatically interpret and analyze the 12-lead electrocardiogram (ECG). The current paradigms with machine learning methods are often limited by the amount of labeled data. This phenomenon is particularly problematic for clinically-relevant data, where labeling at scale can be time-consuming and costly in terms of the specia… ▽ More

    Submitted 10 August, 2022; v1 submitted 1 August, 2022; originally announced August 2022.

    Comments: 26 pages, Figure 13, Machine Learning for Healthcare 2022

    Journal ref: Machine Learning for Healthcare 2022, JMLR Volume 182

  5. arXiv:2205.14977  [pdf, other

    stat.CO cs.LG stat.ML

    Fast Nonlinear Vector Quantile Regression

    Authors: Aviv A. Rosenberg, Sanketh Vedula, Yaniv Romano, Alex M. Bronstein

    Abstract: Quantile regression (QR) is a powerful tool for estimating one or more conditional quantiles of a target variable $\mathrm{Y}$ given explanatory features $\boldsymbol{\mathrm{X}}$. A limitation of QR is that it is only defined for scalar target variables, due to the formulation of its objective function, and since the notion of quantiles has no standard definition for multivariate distributions. R… ▽ More

    Submitted 2 June, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: 35 pages, 15 figures, code: https://github.com/vistalab-technion/vqr

    Journal ref: The Eleventh International Conference on Learning Representations (ICLR 2023)

  6. arXiv:2009.05986  [pdf, other

    cs.LG stat.ML

    Oracle-Efficient Regret Minimization in Factored MDPs with Unknown Structure

    Authors: Aviv Rosenberg, Yishay Mansour

    Abstract: We study regret minimization in non-episodic factored Markov decision processes (FMDPs), where all existing algorithms make the strong assumption that the factored structure of the FMDP is known to the learner in advance. In this paper, we provide the first algorithm that learns the structure of the FMDP while minimizing the regret. Our algorithm is based on the optimism in face of uncertainty pri… ▽ More

    Submitted 11 October, 2021; v1 submitted 13 September, 2020; originally announced September 2020.

    Comments: NeurIPS 2021

  7. arXiv:2006.11561  [pdf, ps, other

    cs.LG stat.ML

    Stochastic Shortest Path with Adversarially Changing Costs

    Authors: Aviv Rosenberg, Yishay Mansour

    Abstract: Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In this paper we present the adversarial SSP model that also accounts for adversarial changes in the costs over time, while the underlying transition function remains unchanged. Formally, an agent interacts with an SSP environment for $K$ episo… ▽ More

    Submitted 5 April, 2022; v1 submitted 20 June, 2020; originally announced June 2020.

  8. arXiv:2002.09869  [pdf, ps, other

    cs.LG stat.ML

    Near-optimal Regret Bounds for Stochastic Shortest Path

    Authors: Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg

    Abstract: Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent is unaware of the environment dynamics (i.e., the transition function) and has to repeatedly play for a given number of episodes while reasoning about the problem's optimal solution. Unlike… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

  9. arXiv:2002.08243  [pdf, ps, other

    cs.LG stat.ML

    Optimistic Policy Optimization with Bandit Feedback

    Authors: Yonathan Efroni, Lior Shani, Aviv Rosenberg, Shie Mannor

    Abstract: Policy optimization methods are one of the most widely used classes of Reinforcement Learning (RL) algorithms. Yet, so far, such methods have been mostly analyzed from an optimization perspective, without addressing the problem of exploration, or by making strong assumptions on the interaction with the environment. In this paper we consider model-based RL in the tabular finite-horizon MDP setting… ▽ More

    Submitted 18 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

    Comments: Accepted to ICML 2020

  10. arXiv:2002.03788  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior

    Authors: Guangzhi Sun, Yu Zhang, Ron J. Weiss, Yuan Cao, Heiga Zen, Andrew Rosenberg, Bhuvana Ramabhadran, Yonghui Wu

    Abstract: Recent neural text-to-speech (TTS) models with fine-grained latent features enable precise control of the prosody of synthesized speech. Such models typically incorporate a fine-grained variational autoencoder (VAE) structure, extracting latent features at each input token (e.g., phonemes). However, generating samples with the standard VAE prior often results in unnatural and discontinuous speech,… ▽ More

    Submitted 6 February, 2020; originally announced February 2020.

    Comments: To appear in ICASSP 2020

  11. arXiv:1905.07773  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Online Convex Optimization in Adversarial Markov Decision Processes

    Authors: Aviv Rosenberg, Yishay Mansour

    Abstract: We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show $\tilde{O}(L|X|\sqrt{|A|T})$ regret bound, where $T$ is the number of episodes, $X$ is the state space, $A$ is the action space, and $L$ is the length of each episode. Our online algorit… ▽ More

    Submitted 19 May, 2019; originally announced May 2019.

  12. arXiv:1301.6282  [pdf, other

    stat.CO

    AABC: approximate approximate Bayesian computation when simulating a large number of data sets is computationally infeasible

    Authors: Erkan O. Buzbas, Noah A. Rosenberg

    Abstract: Approximate Bayesian computation (ABC) methods perform inference on model-specific parameters of mechanistically motivated parametric statistical models when evaluating likelihoods is difficult. Central to the success of ABC methods is computationally inexpensive simulation of data sets from the parametric model of interest. However, when simulating data sets from a model is so computationally exp… ▽ More

    Submitted 26 January, 2013; originally announced January 2013.