Skip to main content

Showing 1–9 of 9 results for author: Willes, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.09032  [pdf, other

    cs.LG cs.AI cs.CL

    Teaching LLMs How to Learn with Contextual Fine-Tuning

    Authors: Younwoo Choi, Muhammad Adil Asif, Ziwen Han, John Willes, Rahul G. Krishnan

    Abstract: Prompting Large Language Models (LLMs), or providing context on the expected model of operation, is an effective way to steer the outputs of such models to satisfy human desiderata after they have been trained. But in rapidly evolving domains, there is often need to fine-tune LLMs to improve either the kind of knowledge in their memory or their abilities to perform open ended reasoning in new doma… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: ICLR 2025

  2. arXiv:2412.09477  [pdf, other

    cs.LG stat.ML

    Bayesian Optimization via Continual Variational Last Layer Training

    Authors: Paul Brunzema, Mikkel Jordahn, John Willes, Sebastian Trimpe, Jasper Snoek, James Harrison

    Abstract: Gaussian Processes (GPs) are widely seen as the state-of-the-art surrogate models for Bayesian optimization (BO) due to their ability to model uncertainty and their performance on tasks where correlations are easily captured (such as those defined by Euclidean metrics) and their ability to be efficiently updated online. However, the performance of GPs depends on the choice of kernel, and kernel se… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  3. arXiv:2406.02969  [pdf, other

    cs.LG cs.AI cs.CL q-fin.CP q-fin.MF

    Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

    Authors: Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz

    Abstract: We propose MoE-F - a formalized mechanism for combining $N$ pre-trained Large Language Models (LLMs) for online time-series prediction by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Di… ▽ More

    Submitted 20 February, 2025; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: 33 pages, 5 Appendix sections

    MSC Class: 60J05; 60G35; 68T20; 68T42; 68T50 ACM Class: I.2.6; I.2.7; G.3

  4. arXiv:2404.11599  [pdf, other

    cs.LG cs.CV stat.ML

    Variational Bayesian Last Layers

    Authors: James Harrison, John Willes, Jasper Snoek

    Abstract: We introduce a deterministic variational formulation for training Bayesian last layer neural networks. This yields a sampling-free, single-pass model and loss that effectively improves uncertainty estimation. Our variational Bayesian last layer (VBLL) can be trained and evaluated with only quadratic complexity in last layer width, and is thus (nearly) computationally free to add to standard archit… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: International Conference on Learning Representations (ICLR) 2024

  5. arXiv:2312.03140  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    FlexModel: A Framework for Interpretability of Distributed Large Language Models

    Authors: Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson

    Abstract: With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed co… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: 14 pages, 8 figures. To appear at the Socially Responsible Language Modelling Research (SoLaR) Workshop, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  6. arXiv:2308.05711  [pdf, other

    cs.LG eess.SY

    A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control

    Authors: Marshall Wang, John Willes, Thomas Jiralerspong, Matin Moezzi

    Abstract: Reinforcement learning (RL) is a promising approach for optimizing HVAC control. RL offers a framework for improving system performance, reducing energy consumption, and enhancing cost efficiency. We benchmark two popular classical and deep RL methods (Q-Learning and Deep-Q-Networks) across multiple HVAC environments and explore the practical consideration of model hyper-parameter selection and re… ▽ More

    Submitted 10 August, 2023; originally announced August 2023.

  7. arXiv:2209.12487  [pdf, other

    cs.CE

    Tartarus: A Benchmarking Platform for Realistic And Practical Inverse Molecular Design

    Authors: AkshatKumar Nigam, Robert Pollice, Gary Tom, Kjell Jorner, John Willes, Luca A. Thiede, Anshul Kundaje, Alan Aspuru-Guzik

    Abstract: The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the… ▽ More

    Submitted 11 October, 2023; v1 submitted 26 September, 2022; originally announced September 2022.

    Comments: 29+21 pages, 6+19 figures, 6+2 tables

  8. arXiv:2208.08041  [pdf, other

    cs.CV

    InterTrack: Interaction Transformer for 3D Multi-Object Tracking

    Authors: John Willes, Cody Reading, Steven L. Waslander

    Abstract: 3D multi-object tracking (MOT) is a key problem for autonomous vehicles, required to perform well-informed motion planning in dynamic environments. Particularly for densely occupied scenes, associating existing tracks to new detections remains challenging as existing systems tend to omit critical contextual information. Our proposed solution, InterTrack, introduces the Interaction Transformer for… ▽ More

    Submitted 6 May, 2023; v1 submitted 16 August, 2022; originally announced August 2022.

    Comments: Accepted to CRV 2023

  9. arXiv:2107.13682  [pdf, other

    cs.CV

    Bayesian Embeddings for Few-Shot Open World Recognition

    Authors: John Willes, James Harrison, Ali Harakeh, Chelsea Finn, Marco Pavone, Steven Waslander

    Abstract: As autonomous decision-making agents move from narrow operating environments to unstructured worlds, learning systems must move from a closed-world formulation to an open-world and few-shot setting in which agents continuously learn new classes from small amounts of information. This stands in stark contrast to modern machine learning systems that are typically designed with a known set of classes… ▽ More

    Submitted 5 October, 2022; v1 submitted 28 July, 2021; originally announced July 2021.