Skip to main content

Showing 1–10 of 10 results for author: Mehta, V

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.16914  [pdf, ps, other

    stat.ME

    Exposure measurement error correction in longitudinal studies with discrete outcomes

    Authors: Ce Yang, Ning Zhang, Jiaxuan Li, Unnati V. Mehta, Jaime E. Hart, Donna Spiegelman, Molin Wang

    Abstract: Environmental epidemiologists are often interested in estimating the effect of time-varying functions of the exposure history on health outcomes. However, the individual exposure measurements that constitute the history upon which an exposure history function is constructed are usually subject to measurement errors. To obtain unbiased estimates of the effects of such mismeasured functions in longi… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 22 pages, has Supplementary

  2. arXiv:2312.00267  [pdf, other

    cs.LG cs.AI stat.ML

    Sample Efficient Preference Alignment in LLMs via Active Exploration

    Authors: Viraj Mehta, Syrine Belakaria, Vikramjeet Das, Ojash Neopane, Yijia Dai, Ilija Bogunovic, Barbara Engelhardt, Stefano Ermon, Jeff Schneider, Willie Neiswanger

    Abstract: Preference-based feedback is important for many applications in machine learning where evaluation of a reward function is not feasible. Notable recent examples arise in preference alignment for large language models, including in reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO). For many applications of preference alignment, the cost of acquiring human fee… ▽ More

    Submitted 20 March, 2025; v1 submitted 30 November, 2023; originally announced December 2023.

  3. arXiv:2307.11288  [pdf, other

    cs.LG cs.AI stat.ML

    Kernelized Offline Contextual Dueling Bandits

    Authors: Viraj Mehta, Ojash Neopane, Vikramjeet Das, Sen Lin, Jeff Schneider, Willie Neiswanger

    Abstract: Preference-based feedback is important for many applications where direct evaluation of a reward function is not feasible. A notable recent example arises in reinforcement learning from human feedback on large language models. For many of these applications, the cost of acquiring the human feedback can be substantial or even prohibitive. In this work, we take advantage of the fact that often the a… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

  4. arXiv:2212.09510  [pdf, other

    stat.ML cs.AI cs.LG

    Near-optimal Policy Identification in Active Reinforcement Learning

    Authors: Xiang Li, Viraj Mehta, Johannes Kirschner, Ian Char, Willie Neiswanger, Jeff Schneider, Andreas Krause, Ilija Bogunovic

    Abstract: Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

  5. arXiv:2210.04642  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Exploration via Planning for Information about the Optimal Trajectory

    Authors: Viraj Mehta, Ian Char, Joseph Abbate, Rory Conlin, Mark D. Boyer, Stefano Ermon, Jeff Schneider, Willie Neiswanger

    Abstract: Many potential applications of reinforcement learning (RL) are stymied by the large numbers of samples required to learn an effective policy. This is especially true when applying RL to real-world control tasks, e.g. in the sciences or robotics, where executing a policy in the environment is costly. In popular RL algorithms, agents typically explore either by adding stochasticity to a reward-maxim… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: Conference paper at Neurips 2022. Code available at https://github.com/fusion-ml/trajectory-information-rl. arXiv admin note: text overlap with arXiv:2112.05244

  6. arXiv:2112.06868  [pdf, other

    cs.LG stat.ML

    Variational autoencoders in the presence of low-dimensional data: landscape and implicit bias

    Authors: Frederic Koehler, Viraj Mehta, Chenghui Zhou, Andrej Risteski

    Abstract: Variational Autoencoders are one of the most commonly used generative models, particularly for image data. A prominent difficulty in training VAEs is data that is supported on a lower-dimensional manifold. Recent work by Dai and Wipf (2020) proposes a two-stage training algorithm for VAEs, based on a conjecture that in standard VAE training the generator will converge to a solution with 0 variance… ▽ More

    Submitted 17 May, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Accepted as a conference paper at ICLR 2022

  7. arXiv:2112.05244  [pdf, other

    cs.LG cs.AI cs.IT cs.RO stat.ML

    An Experimental Design Perspective on Model-Based Reinforcement Learning

    Authors: Viraj Mehta, Biswajit Paria, Jeff Schneider, Stefano Ermon, Willie Neiswanger

    Abstract: In many practical applications of RL, it is expensive to observe state transitions from the environment. For example, in the problem of plasma control for nuclear fusion, computing the next state for a given state-action pair requires querying an expensive transition function which can lead to many hours of computer simulation or dollars of scientific research. Such expensive data collection prohi… ▽ More

    Submitted 15 March, 2022; v1 submitted 9 December, 2021; originally announced December 2021.

    Comments: Conference paper at ICLR 2022

  8. arXiv:2010.01155  [pdf, other

    cs.LG stat.ML

    Representational aspects of depth and conditioning in normalizing flows

    Authors: Frederic Koehler, Viraj Mehta, Andrej Risteski

    Abstract: Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: model… ▽ More

    Submitted 25 June, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

    Comments: Appeared in ICML 2021

  9. Neural Dynamical Systems: Balancing Structure and Flexibility in Physical Prediction

    Authors: Viraj Mehta, Ian Char, Willie Neiswanger, Youngseog Chung, Andrew Oakleigh Nelson, Mark D Boyer, Egemen Kolemen, Jeff Schneider

    Abstract: We introduce Neural Dynamical Systems (NDS), a method of learning dynamical models in various gray-box settings which incorporates prior knowledge in the form of systems of ordinary differential equations. NDS uses neural networks to estimate free parameters of the system, predicts residual terms, and numerically integrates over time to predict future states. A key insight is that many real dynami… ▽ More

    Submitted 27 April, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

  10. arXiv:1806.09266  [pdf, other

    cs.RO cs.CV cs.LG stat.ML

    Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision

    Authors: Kuan Fang, Yuke Zhu, Animesh Garg, Andrey Kurenkov, Viraj Mehta, Li Fei-Fei, Silvio Savarese

    Abstract: Tool manipulation is vital for facilitating robots to complete challenging task goals. It requires reasoning about the desired effect of the task and thus properly grasping and manipulating the tool to achieve the task. Task-agnostic grasping optimizes for grasp robustness while ignoring crucial task-specific constraints. In this paper, we propose the Task-Oriented Grasping Network (TOG-Net) to jo… ▽ More

    Submitted 24 June, 2018; originally announced June 2018.

    Comments: RSS 2018