Search | arXiv e-print repository

Teaching LLMs How to Learn with Contextual Fine-Tuning

Authors: Younwoo Choi, Muhammad Adil Asif, Ziwen Han, John Willes, Rahul G. Krishnan

Abstract: Prompting Large Language Models (LLMs), or providing context on the expected model of operation, is an effective way to steer the outputs of such models to satisfy human desiderata after they have been trained. But in rapidly evolving domains, there is often need to fine-tune LLMs to improve either the kind of knowledge in their memory or their abilities to perform open ended reasoning in new doma… ▽ More Prompting Large Language Models (LLMs), or providing context on the expected model of operation, is an effective way to steer the outputs of such models to satisfy human desiderata after they have been trained. But in rapidly evolving domains, there is often need to fine-tune LLMs to improve either the kind of knowledge in their memory or their abilities to perform open ended reasoning in new domains. When human's learn new concepts, we often do so by linking the new material that we are studying to concepts we have already learned before. To that end, we ask, "can prompting help us teach LLMs how to learn". In this work, we study a novel generalization of instruction tuning, called contextual fine-tuning, to fine-tune LLMs. Our method leverages instructional prompts designed to mimic human cognitive strategies in learning and problem-solving to guide the learning process during training, aiming to improve the model's interpretation and understanding of domain-specific knowledge. We empirically demonstrate that this simple yet effective modification improves the ability of LLMs to be fine-tuned rapidly on new datasets both within the medical and financial domains. △ Less

Submitted 11 March, 2025; originally announced March 2025.

Comments: ICLR 2025

arXiv:2312.03864 [pdf, other]

Geometry Matching for Multi-Embodiment Grasping

Authors: Maria Attarian, Muhammad Adil Asif, Jingzhou Liu, Ruthrash Hari, Animesh Garg, Igor Gilitschenski, Jonathan Tompson

Abstract: Many existing learning-based grasping approaches concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. We tackle the problem of grasping using multiple embodiments by learning rich geometric representations for both objects and end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies s… ▽ More Many existing learning-based grasping approaches concentrate on a single embodiment, provide limited generalization to higher DoF end-effectors and cannot capture a diverse set of grasp modes. We tackle the problem of grasping using multiple embodiments by learning rich geometric representations for both objects and end-effectors using Graph Neural Networks. Our novel method - GeoMatch - applies supervised learning on grasping data from multiple embodiments, learning end-to-end contact point likelihood maps as well as conditional autoregressive predictions of grasps keypoint-by-keypoint. We compare our method against baselines that support multiple embodiments. Our approach performs better across three end-effectors, while also producing diverse grasps. Examples, including real robot demos, can be found at geo-match.github.io. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Journal ref: 7th Annual Conference on Robot Learning, 2023

arXiv:2312.03140 [pdf, other]

FlexModel: A Framework for Interpretability of Distributed Large Language Models

Authors: Matthew Choi, Muhammad Adil Asif, John Willes, David Emerson

Abstract: With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed co… ▽ More With the growth of large language models, now incorporating billions of parameters, the hardware prerequisites for their training and deployment have seen a corresponding increase. Although existing tools facilitate model parallelization and distributed training, deeper model interactions, crucial for interpretability and responsible AI techniques, still demand thorough knowledge of distributed computing. This often hinders contributions from researchers with machine learning expertise but limited distributed computing background. Addressing this challenge, we present FlexModel, a software package providing a streamlined interface for engaging with models distributed across multi-GPU and multi-node configurations. The library is compatible with existing model distribution libraries and encapsulates PyTorch models. It exposes user-registerable HookFunctions to facilitate straightforward interaction with distributed model internals, bridging the gap between distributed and single-device model paradigms. Primarily, FlexModel enhances accessibility by democratizing model interactions and promotes more inclusive research in the domain of large-scale neural networks. The package is found at https://github.com/VectorInstitute/flex_model. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 14 pages, 8 figures. To appear at the Socially Responsible Language Modelling Research (SoLaR) Workshop, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:1911.05400 [pdf, other]

doi 10.1016/j.jfranklin.2020.11.012

Implicit Higher-Order Moment Matching Technique for Model Reduction of Quadratic-bilinear Systems

Authors: Mian Muhammad Arsalan Asif, Mian Ilyas Ahmad, Peter Benner, Lihong Feng, Tatjana Stykel

Abstract: We propose a projection based multi-moment matching method for model order reduction of quadratic-bilinear systems. The goal is to construct a reduced system that ensures higher-order moment matching for the multivariate transfer functions appearing in the input-output representation of the nonlinear system. An existing technique achieves this for the first two multivariate transfer functions, in… ▽ More We propose a projection based multi-moment matching method for model order reduction of quadratic-bilinear systems. The goal is to construct a reduced system that ensures higher-order moment matching for the multivariate transfer functions appearing in the input-output representation of the nonlinear system. An existing technique achieves this for the first two multivariate transfer functions, in what is called the symmetric form of the multivariate transfer functions. We extend this framework to an equivalent and simplified form, the regular form, which allows us to show moment matching for the first three multivariate transfer functions. Numerical results for three benchmark examples of quadratic-bilinear systems show that the proposed framework exhibits better performance with reduced computational cost in comparison to existing techniques. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Comments: 19 pages, 11 subfigures in 6 figures, Journal

MSC Class: 35G50

Showing 1–4 of 4 results for author: Asif, M A