-
Beyond Relations: A Case for Elevating to the Entity-Relationship Abstraction
Authors:
Amol Deshpande
Abstract:
Spurred by a number of recent trends, we make the case that the relational database systems should urgently move beyond supporting the basic object-relational model and instead embrace a more abstract data model, specifically, the entity-relationship model. We argue that the current RDBMSs don't inherently support sufficient "logical" data independence, and that is relegating the database systems…
▽ More
Spurred by a number of recent trends, we make the case that the relational database systems should urgently move beyond supporting the basic object-relational model and instead embrace a more abstract data model, specifically, the entity-relationship model. We argue that the current RDBMSs don't inherently support sufficient "logical" data independence, and that is relegating the database systems to the role of a backend storage system, away from where significant innovation is both happening and is still needed. We present the design of a prototype system (ErbiumDB) that we are building to explore these issues, and discuss some of the key research challenges.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
BiFlex: A Passive Bimodal Stiffness Flexible Wrist for Manipulation in Unstructured Environments
Authors:
Gu-Cheol Jeong,
Stefano Dalla Gasperina,
Ashish D. Deshpande,
Lillian Chin,
Roberto Martín-Martín
Abstract:
Robotic manipulation in unstructured, humancentric environments poses a dual challenge: achieving the precision need for delicate free-space operation while ensuring safety during unexpected contact events. Traditional wrists struggle to balance these demands, often relying on complex control schemes or complicated mechanical designs to mitigate potential damage from force overload. In response, w…
▽ More
Robotic manipulation in unstructured, humancentric environments poses a dual challenge: achieving the precision need for delicate free-space operation while ensuring safety during unexpected contact events. Traditional wrists struggle to balance these demands, often relying on complex control schemes or complicated mechanical designs to mitigate potential damage from force overload. In response, we present BiFlex, a flexible robotic wrist that uses a soft buckling honeycomb structure to provides a natural bimodal stiffness response. The higher stiffness mode enables precise household object manipulation, while the lower stiffness mode provides the compliance needed to adapt to external forces. We design BiFlex to maintain a fingertip deflection of less than 1 cm while supporting loads up to 500g and create a BiFlex wrist for many grippers, including Panda, Robotiq, and BaRiFlex. We validate BiFlex under several real-world experimental evaluations, including surface wiping, precise pick-and-place, and grasping under environmental constraints. We demonstrate that BiFlex simplifies control while maintaining precise object manipulation and enhanced safety in real-world applications.
△ Less
Submitted 14 May, 2025; v1 submitted 11 April, 2025;
originally announced April 2025.
-
Implementation of Support Vector Machines using Reaction Networks
Authors:
Amey Choudhary,
Jiaxin Jin,
Abhishek Deshpande
Abstract:
Can machine learning algorithms be implemented using chemical reaction networks? We demonstrate that this is possible in the case of support vector machines (SVMs). SVMs are powerful tools for data classification, leveraging VC theory to handle high-dimensional data and small datasets effectively. In this work, we propose a reaction network scheme for implementing SVMs, utilizing the steady-state…
▽ More
Can machine learning algorithms be implemented using chemical reaction networks? We demonstrate that this is possible in the case of support vector machines (SVMs). SVMs are powerful tools for data classification, leveraging VC theory to handle high-dimensional data and small datasets effectively. In this work, we propose a reaction network scheme for implementing SVMs, utilizing the steady-state behavior of reaction network dynamics to model key computational aspects of SVMs. This approach introduces a novel biochemical framework for implementing machine learning algorithms in non-traditional computational environments.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Enforcing Cybersecurity Constraints for LLM-driven Robot Agents for Online Transactions
Authors:
Shraddha Pradipbhai Shah,
Aditya Vilas Deshpande
Abstract:
The integration of Large Language Models (LLMs) into autonomous robotic agents for conducting online transactions poses significant cybersecurity challenges. This study aims to enforce robust cybersecurity constraints to mitigate the risks associated with data breaches, transaction fraud, and system manipulation. The background focuses on the rise of LLM-driven robotic systems in e-commerce, finan…
▽ More
The integration of Large Language Models (LLMs) into autonomous robotic agents for conducting online transactions poses significant cybersecurity challenges. This study aims to enforce robust cybersecurity constraints to mitigate the risks associated with data breaches, transaction fraud, and system manipulation. The background focuses on the rise of LLM-driven robotic systems in e-commerce, finance, and service industries, alongside the vulnerabilities they introduce. A novel security architecture combining blockchain technology with multi-factor authentication (MFA) and real-time anomaly detection was implemented to safeguard transactions. Key performance metrics such as transaction integrity, response time, and breach detection accuracy were evaluated, showing improved security and system performance. The results highlight that the proposed architecture reduced fraudulent transactions by 90%, improved breach detection accuracy to 98%, and ensured secure transaction validation within a latency of 0.05 seconds. These findings emphasize the importance of cybersecurity in the deployment of LLM-driven robotic systems and suggest a framework adaptable to various online platforms.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
Sensing-Based Beamformed Resource Allocation in Standalone Millimeter-Wave Vehicular Networks
Authors:
Alessandro Traspadini,
Anay Ajit Deshpande,
Marco Giordani,
Chinmay Mahabal,
Takayuki Shimizu,
Michele Zorzi
Abstract:
In 3GPP New Radio (NR) Vehicle-to-Everything (V2X), the new standard for next-generation vehicular networks, vehicles can autonomously select sidelink resources for data transmission, which permits network operations without cellular coverage. However, standalone resource allocation is uncoordinated, and is complicated by the high mobility of the nodes that may introduce unforeseen channel collisi…
▽ More
In 3GPP New Radio (NR) Vehicle-to-Everything (V2X), the new standard for next-generation vehicular networks, vehicles can autonomously select sidelink resources for data transmission, which permits network operations without cellular coverage. However, standalone resource allocation is uncoordinated, and is complicated by the high mobility of the nodes that may introduce unforeseen channel collisions (e.g., when a transmitting vehicle changes path) or free up resources (e.g., when a vehicle moves outside of the communication area). Moreover, unscheduled resource allocation is prone to the hidden node and exposed node problems, which are particularly critical considering directional transmissions. In this paper, we implement and demonstrate a new channel access scheme for NR V2X in Frequency Range 2 (FR2), i.e., at millimeter wave (mmWave) frequencies, based on directional and beamformed transmissions along with Sidelink Control Information (SCI) to select resources for transmission. We prove via simulation that this approach can reduce the probability of collision for resource allocation, compared to a baseline solution that does not configure SCI transmissions.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
TreeCat: Standalone Catalog Engine for Large Data Systems
Authors:
Keonwoo Oh,
Pooja Nilangekar,
Amol Deshpande
Abstract:
With ever increasing volume and heterogeneity of data, advent of new specialized compute engines, and demand for complex use cases, large scale data systems require a performant catalog system that can satisfy diverse needs. We argue that existing solutions, including recent lakehouse storage formats, have fundamental limitations and that there is a strong motivation for a specialized database eng…
▽ More
With ever increasing volume and heterogeneity of data, advent of new specialized compute engines, and demand for complex use cases, large scale data systems require a performant catalog system that can satisfy diverse needs. We argue that existing solutions, including recent lakehouse storage formats, have fundamental limitations and that there is a strong motivation for a specialized database engine, dedicated to serve as the catalog. We present the design and implementation of TreeCat, a database engine that features a hierarchical data model with a path-based query language, a storage format optimized for efficient range queries and versioning, and a correlated scan operation that enables fast query execution. A key performance challenge is supporting concurrent read and write operations from many different clients while providing strict consistency guarantees. To this end, we present a novel MVOCC (multi-versioned optimistic concurrency control) protocol that guarantees serializable isolation. We conduct a comprehensive experimental evaluation comparing our concurrency control scheme with prior techniques, and evaluating our overall system against Hive Metastore, Delta Lake, and Iceberg.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
Tensor Network Structure Search Using Program Synthesis
Authors:
Zheng Guo,
Aditya Deshpande,
Brian Kiedrowski,
Xinyu Wang,
Alex Gorodetsky
Abstract:
Tensor networks provide a powerful framework for compressing multi-dimensional data. The optimal tensor network structure for a given data tensor depends on both the inherent data properties and the specific optimality criteria, making tensor network structure search a crucial research problem. Existing solutions typically involve sampling and validating numerous candidate structures; this is comp…
▽ More
Tensor networks provide a powerful framework for compressing multi-dimensional data. The optimal tensor network structure for a given data tensor depends on both the inherent data properties and the specific optimality criteria, making tensor network structure search a crucial research problem. Existing solutions typically involve sampling and validating numerous candidate structures; this is computationally expensive, limiting their practical applications. We address this challenge by formulating tensor network structure search as a program synthesis problem and proposing a highly efficient validation method that is based on constraint solving. Specifically, we design a domain specific language: it builds the correspondence between programs and network structures, and uses a novel idea of output-directed splits to compress the search space without hindering the expressiveness. We then propose a synthesis algorithm that can prioritize promising candidates through constraint solving. % Experimental results show that our approach improves search speed by $10\times$ and achieves compression ratios by $1.5\times$ to $3\times$ better than state-of-the-art. Notably, our approach scales to larger tensors that are out of reach by prior work. Finally, we demonstrate that the discovered topologies generalize to data from the same source, achieving compression ratios up to $ 2.4\times$ better than hierarchical Tuckers while maintaining the runtime around $110$ seconds.
△ Less
Submitted 22 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
Optimizing Queries with Many-to-Many Joins
Authors:
Hasara Kalumin,
Amol Deshpande
Abstract:
As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional enterprise settings that have been the focus of much of the query optimization work to date, such queries are seen frequently in other contexts such as graph workloads.…
▽ More
As database query processing techniques are being used to handle diverse workloads, a key emerging challenge is how to efficiently handle multi-way join queries containing multiple many-to-many joins. While uncommon in traditional enterprise settings that have been the focus of much of the query optimization work to date, such queries are seen frequently in other contexts such as graph workloads. This has led to much work on developing join algorithms for handling cyclic queries, on compressed (factorized) representations for more efficient storage of intermediate results, and on use of semi-joins or predicate transfer to avoid generating large redundant intermediate results. In this paper, we address a core query optimization problem in this context. Specifically, we introduce an improved cost model that more accurately captures the cost of a query plan in such scenarios, and we present several optimization algorithms for query optimization that incorporate these new cost functions. We present an extensive experimental evaluation, that compares the factorized representation approach with a full semi-join reduction approach as well as to an approach that uses bitvectors to eliminate tuples early through sideways information passing. We also present new analyses of robustness of these techniques to the choice of the join order, potentially eliminating the need for more complex query optimization and selectivity estimation techniques.
△ Less
Submitted 24 April, 2025; v1 submitted 20 December, 2024;
originally announced December 2024.
-
Hybrid deep learning-based strategy for the hepatocellular carcinoma cancer grade classification of H&E stained liver histopathology images
Authors:
Ajinkya Deshpande,
Deep Gupta,
Ankit Bhurane,
Nisha Meshram,
Sneha Singh,
Petia Radeva
Abstract:
Hepatocellular carcinoma (HCC) is a common type of liver cancer whose early-stage diagnosis is a common challenge, mainly due to the manual assessment of hematoxylin and eosin-stained whole slide images, which is a time-consuming process and may lead to variability in decision-making. For accurate detection of HCC, we propose a hybrid deep learning-based architecture that uses transfer learning to…
▽ More
Hepatocellular carcinoma (HCC) is a common type of liver cancer whose early-stage diagnosis is a common challenge, mainly due to the manual assessment of hematoxylin and eosin-stained whole slide images, which is a time-consuming process and may lead to variability in decision-making. For accurate detection of HCC, we propose a hybrid deep learning-based architecture that uses transfer learning to extract the features from pre-trained convolutional neural network (CNN) models and a classifier made up of a sequence of fully connected layers. This study uses a publicly available The Cancer Genome Atlas Hepatocellular Carcinoma (TCGA-LIHC)database (n=491) for model development and database of Kasturba Gandhi Medical College (KMC), India for validation. The pre-processing step involves patch extraction, colour normalization, and augmentation that results in 3920 patches for the TCGA dataset. The developed hybrid deep neural network consisting of a CNN-based pre-trained feature extractor and a customized artificial neural network-based classifier is trained using five-fold cross-validation. For this study, eight different state-of-the-art models are trained and tested as feature extractors for the proposed hybrid model. The proposed hybrid model with ResNet50-based feature extractor provided the sensitivity, specificity, F1-score, accuracy, and AUC of 100.00%, 100.00%, 100.00%, 100.00%, and 1.00, respectively on the TCGA database. On the KMC database, EfficientNetb3 resulted in the optimal choice of the feature extractor giving sensitivity, specificity, F1-score, accuracy, and AUC of 96.97, 98.85, 96.71, 96.71, and 0.99, respectively. The proposed hybrid models showed improvement in accuracy of 2% and 4% over the pre-trained models in TCGA-LIHC and KMC databases.
△ Less
Submitted 28 February, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
A conditional Generative Adversarial network model for the Weather4Cast 2024 Challenge
Authors:
Atharva Deshpande,
Kaushik Gopalan,
Jeet Shah,
Hrishikesh Simu
Abstract:
This study explores the application of deep learning for rainfall prediction, leveraging the Spinning Enhanced Visible and Infrared Imager (SEVIRI) High rate information transmission (HRIT) data as input and the Operational Program on the Exchange of weather RAdar information (OPERA) ground-radar reflectivity data as ground truth. We use the mean of 4 InfraRed frequency channels as the input. The…
▽ More
This study explores the application of deep learning for rainfall prediction, leveraging the Spinning Enhanced Visible and Infrared Imager (SEVIRI) High rate information transmission (HRIT) data as input and the Operational Program on the Exchange of weather RAdar information (OPERA) ground-radar reflectivity data as ground truth. We use the mean of 4 InfraRed frequency channels as the input. The radiance images are forecasted up to 4 hours into the future using a dense optical flow algorithm. A conditional generative adversarial network (GAN) model is employed to transform the predicted radiance images into rainfall images which are aggregated over the 4 hour forecast period to generate cumulative rainfall values. This model scored a value of approximately 7.5 as the Continuous Ranked Probability Score (CRPS) in the Weather4Cast 2024 competition and placed 1st on the core challenge leaderboard.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation
Authors:
Shreyas Chaudhari,
Ameet Deshpande,
Bruno Castro da Silva,
Philip S. Thomas
Abstract:
Evaluating policies using off-policy data is crucial for applying reinforcement learning to real-world problems such as healthcare and autonomous driving. Previous methods for off-policy evaluation (OPE) generally suffer from high variance or irreducible bias, leading to unacceptably high prediction errors. In this work, we introduce STAR, a framework for OPE that encompasses a broad range of esti…
▽ More
Evaluating policies using off-policy data is crucial for applying reinforcement learning to real-world problems such as healthcare and autonomous driving. Previous methods for off-policy evaluation (OPE) generally suffer from high variance or irreducible bias, leading to unacceptably high prediction errors. In this work, we introduce STAR, a framework for OPE that encompasses a broad range of estimators -- which include existing OPE methods as special cases -- that achieve lower mean squared prediction errors. STAR leverages state abstraction to distill complex, potentially continuous problems into compact, discrete models which we call abstract reward processes (ARPs). Predictions from ARPs estimated from off-policy data are provably consistent (asymptotically correct). Rather than proposing a specific estimator, we present a new framework for OPE and empirically demonstrate that estimators within STAR outperform existing methods. The best STAR estimator outperforms baselines in all twelve cases studied, and even the median STAR estimator surpasses the baselines in seven out of the twelve cases.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
PersonaGym: Evaluating Persona Agents and LLMs
Authors:
Vinay Samuel,
Henry Peng Zou,
Yue Zhou,
Shreyas Chaudhari,
Ashwin Kalyan,
Tanmay Rajpurohit,
Ameet Deshpande,
Karthik Narasimhan,
Vishvak Murahari
Abstract:
Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the…
▽ More
Persona agents, which are LLM agents that act according to an assigned persona, have demonstrated impressive contextual response capabilities across various applications. These persona agents offer significant enhancements across diverse sectors, such as education, healthcare, and entertainment, where model developers can align agent responses to different user requirements thereby broadening the scope of agent applications. However, evaluating persona agent performance is incredibly challenging due to the complexity of assessing persona adherence in free-form interactions across various environments that are relevant to each persona agent. We introduce PersonaGym, the first dynamic evaluation framework for assessing persona agents, and PersonaScore, the first automated human-aligned metric grounded in decision theory for comprehensive large-scale evaluation of persona agents. Our evaluation of 6 open and closed-source LLMs, using a benchmark encompassing 200 personas and 10,000 questions, reveals significant opportunities for advancement in persona agent capabilities across state-of-the-art models. For example, Claude 3.5 Sonnet only has a 2.97% relative improvement in PersonaScore than GPT 3.5 despite being a much more advanced model. Importantly, we find that increased model size and complexity do not necessarily imply enhanced persona agent capabilities thereby highlighting the pressing need for algorithmic and architectural invention towards faithful and performant persona agents.
△ Less
Submitted 18 December, 2024; v1 submitted 25 July, 2024;
originally announced July 2024.
-
RAVEN: Multitask Retrieval Augmented Vision-Language Learning
Authors:
Varun Nagaraj Rao,
Siddharth Choudhary,
Aditya Deshpande,
Ravi Kumar Satzoda,
Srikar Appalaraju
Abstract:
The scaling of large language models to encode all the world's knowledge in model parameters is unsustainable and has exacerbated resource barriers. Retrieval-Augmented Generation (RAG) presents a potential solution, yet its application to vision-language models (VLMs) is under explored. Existing methods focus on models designed for single tasks. Furthermore, they're limited by the need for resour…
▽ More
The scaling of large language models to encode all the world's knowledge in model parameters is unsustainable and has exacerbated resource barriers. Retrieval-Augmented Generation (RAG) presents a potential solution, yet its application to vision-language models (VLMs) is under explored. Existing methods focus on models designed for single tasks. Furthermore, they're limited by the need for resource intensive pre training, additional parameter requirements, unaddressed modality prioritization and lack of clear benefit over non-retrieval baselines. This paper introduces RAVEN, a multitask retrieval augmented VLM framework that enhances base VLMs through efficient, task specific fine-tuning. By integrating retrieval augmented samples without the need for additional retrieval-specific parameters, we show that the model acquires retrieval properties that are effective across multiple tasks. Our results and extensive ablations across retrieved modalities for the image captioning and VQA tasks indicate significant performance improvements compared to non retrieved baselines +1 CIDEr on MSCOCO, +4 CIDEr on NoCaps and nearly a +3\% accuracy on specific VQA question types. This underscores the efficacy of applying RAG approaches to VLMs, marking a stride toward more efficient and accessible multimodal learning.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
On the Power of Randomization in Fair Classification and Representation
Authors:
Sushant Agarwal,
Amit Deshpande
Abstract:
Fair classification and fair representation learning are two important problems in supervised and unsupervised fair machine learning, respectively. Fair classification asks for a classifier that maximizes accuracy on a given data distribution subject to fairness constraints. Fair representation maps a given data distribution over the original feature space to a distribution over a new representati…
▽ More
Fair classification and fair representation learning are two important problems in supervised and unsupervised fair machine learning, respectively. Fair classification asks for a classifier that maximizes accuracy on a given data distribution subject to fairness constraints. Fair representation maps a given data distribution over the original feature space to a distribution over a new representation space such that all classifiers over the representation satisfy fairness. In this paper, we examine the power of randomization in both these problems to minimize the loss of accuracy that results when we impose fairness constraints. Previous work on fair classification has characterized the optimal fair classifiers on a given data distribution that maximize accuracy subject to fairness constraints, e.g., Demographic Parity (DP), Equal Opportunity (EO), and Predictive Equality (PE). We refine these characterizations to demonstrate when the optimal randomized fair classifiers can surpass their deterministic counterparts in accuracy. We also show how the optimal randomized fair classifier that we characterize can be obtained as a solution to a convex optimization problem. Recent work has provided techniques to construct fair representations for a given data distribution such that any classifier over this representation satisfies DP. However, the classifiers on these fair representations either come with no or weak accuracy guarantees when compared to the optimal fair classifier on the original data distribution. Extending our ideas for randomized fair classification, we improve on these works, and construct DP-fair, EO-fair, and PE-fair representations that have provably optimal accuracy and suffer no accuracy loss compared to the optimal DP-fair, EO-fair, and PE-fair classifiers respectively on the original data distribution.
△ Less
Submitted 7 October, 2024; v1 submitted 5 June, 2024;
originally announced June 2024.
-
Data Efficient Behavior Cloning for Fine Manipulation via Continuity-based Corrective Labels
Authors:
Abhay Deshpande,
Liyiming Ke,
Quinn Pfeifer,
Abhishek Gupta,
Siddhartha S. Srinivasa
Abstract:
We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by l…
▽ More
We consider imitation learning with access only to expert demonstrations, whose real-world application is often limited by covariate shift due to compounding errors during execution. We investigate the effectiveness of the Continuity-based Corrective Labels for Imitation Learning (CCIL) framework in mitigating this issue for real-world fine manipulation tasks. CCIL generates corrective labels by learning a locally continuous dynamics model from demonstrations to guide the agent back toward expert states. Through extensive experiments on peg insertion and fine grasping, we provide the first empirical validation that CCIL can significantly improve imitation learning performance despite discontinuities present in contact-rich manipulation. We find that: (1) real-world manipulation exhibits sufficient local smoothness to apply CCIL, (2) generated corrective labels are most beneficial in low-data regimes, and (3) label filtering based on estimated dynamics model error enables performance gains. To effectively apply CCIL to robotic domains, we offer a practical instantiation of the framework and insights into design choices and hyperparameter selection. Our work demonstrates CCIL's practicality for alleviating compounding errors in imitation learning on physical robots.
△ Less
Submitted 21 October, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Deception in Reinforced Autonomous Agents
Authors:
Atharvan Dogra,
Krishna Pillutla,
Ameet Deshpande,
Ananya B Sai,
John Nay,
Tanmay Rajpurohit,
Ashwin Kalyan,
Balaraman Ravindran
Abstract:
We explore the ability of large language model (LLM)-based agents to engage in subtle deception such as strategically phrasing and intentionally manipulating information to misguide and deceive other agents. This harmful behavior can be hard to detect, unlike blatant lying or unintentional hallucination. We build an adversarial testbed mimicking a legislative environment where two LLMs play opposi…
▽ More
We explore the ability of large language model (LLM)-based agents to engage in subtle deception such as strategically phrasing and intentionally manipulating information to misguide and deceive other agents. This harmful behavior can be hard to detect, unlike blatant lying or unintentional hallucination. We build an adversarial testbed mimicking a legislative environment where two LLMs play opposing roles: a corporate *lobbyist* proposing amendments to bills that benefit a specific company while evading a *critic* trying to detect this deception. We use real-world legislative bills matched with potentially affected companies to ground these interactions. Our results show that LLM lobbyists initially exhibit limited deception against strong LLM critics which can be further improved through simple verbal reinforcement, significantly enhancing their deceptive capabilities, and increasing deception rates by up to 40 points. This highlights the risk of autonomous agents manipulating other agents through seemingly neutral language to attain self-serving goals.
△ Less
Submitted 4 October, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository
Authors:
Ajinkya Deshpande,
Anmol Agarwal,
Shashank Shet,
Arun Iyer,
Aditya Kanade,
Ramakrishna Bairi,
Suresh Parthasarathy
Abstract:
LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, ne…
▽ More
LLMs have demonstrated significant potential in code generation tasks, achieving promising results at the function or statement level across various benchmarks. However, the complexities associated with creating code artifacts like classes, particularly within the context of real-world software repositories, remain underexplored. Prior research treats class-level generation as an isolated task, neglecting the intricate dependencies & interactions that characterize real-world software environments. To address this gap, we introduce RepoClassBench, a comprehensive benchmark designed to rigorously evaluate LLMs in generating complex, class-level code within real-world repositories. RepoClassBench includes "Natural Language to Class generation" tasks across Java, Python & C# from a selection of repositories. We ensure that each class in our dataset not only has cross-file dependencies within the repository but also includes corresponding test cases to verify its functionality. We find that current models struggle with the realistic challenges posed by our benchmark, primarily due to their limited exposure to relevant repository contexts. To address this shortcoming, we introduce Retrieve-Repotools-Reflect (RRR), a novel approach that equips LLMs with static analysis tools to iteratively navigate & reason about repository-level context in an agent-based framework. Our experiments demonstrate that RRR significantly outperforms existing baselines on RepoClassBench, showcasing its effectiveness across programming languages & under various settings. Our findings emphasize the critical need for code-generation benchmarks to incorporate repo-level dependencies to more accurately reflect the complexities of software development. Our work shows the benefits of leveraging specialized tools to enhance LLMs' understanding of repository context. We plan to make our dataset & evaluation harness public.
△ Less
Submitted 5 June, 2024; v1 submitted 21 April, 2024;
originally announced May 2024.
-
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Authors:
Shreyas Chaudhari,
Pranjal Aggarwal,
Vishvak Murahari,
Tanmay Rajpurohit,
Ashwin Kalyan,
Karthik Narasimhan,
Ameet Deshpande,
Bruno Castro da Silva
Abstract:
State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hal…
▽ More
State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hallucinations. Yet, an understanding of RLHF for LLMs is largely entangled with initial design choices that popularized the method and current research focuses on augmenting those choices rather than fundamentally improving the framework. In this paper, we analyze RLHF through the lens of reinforcement learning principles to develop an understanding of its fundamentals, dedicating substantial focus to the core component of RLHF -- the reward model. Our study investigates modeling choices, caveats of function approximation, and their implications on RLHF training algorithms, highlighting the underlying assumptions made about the expressivity of reward. Our analysis improves the understanding of the role of reward models and methods for their training, concurrently revealing limitations of the current methodology. We characterize these limitations, including incorrect generalization, model misspecification, and the sparsity of feedback, along with their impact on the performance of a language model. The discussion and analysis are substantiated by a categorical review of current literature, serving as a reference for researchers and practitioners to understand the challenges of RLHF and build upon existing efforts.
△ Less
Submitted 15 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Characterizing Flow Complexity in Transportation Networks using Graph Homology
Authors:
Shashank A Deshpande,
Hamsa Balakrishnan
Abstract:
Series-parallel network topologies generally exhibit simplified dynamical behavior and avoid high combinatorial complexity. A comprehensive analysis of how flow complexity emerges with a graph's deviation from series-parallel topology is therefore of fundamental interest. We introduce the notion of a robust $k$-path on a directed acycylic graph, with increasing values of the length $k$ reflecting…
▽ More
Series-parallel network topologies generally exhibit simplified dynamical behavior and avoid high combinatorial complexity. A comprehensive analysis of how flow complexity emerges with a graph's deviation from series-parallel topology is therefore of fundamental interest. We introduce the notion of a robust $k$-path on a directed acycylic graph, with increasing values of the length $k$ reflecting increasing deviations. We propose a graph homology with robust $k$-paths as the bases of its chain spaces. In this framework, the topological simplicity of series-parallel graphs translates into a triviality of higher-order chain spaces. We discuss a correspondence between the space of order-three chains and sites within the network that are susceptible to the Braess paradox, a well-known phenomenon in transportation networks. In this manner, we illustrate the utility of the proposed graph homology in sytematically studying the complexity of flow networks.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
To Store or Not to Store: a graph theoretical approach for Dataset Versioning
Authors:
Anxin Guo,
Jingwei Li,
Pattara Sukprasert,
Samir Khuller,
Amol Deshpande,
Koyel Mukherjee
Abstract:
In this work, we study the cost efficient data versioning problem, where the goal is to optimize the storage and reconstruction (retrieval) costs of data versions, given a graph of datasets as nodes and edges capturing edit/delta information. One central variant we study is MinSum Retrieval (MSR) where the goal is to minimize the total retrieval costs, while keeping the storage costs bounded. This…
▽ More
In this work, we study the cost efficient data versioning problem, where the goal is to optimize the storage and reconstruction (retrieval) costs of data versions, given a graph of datasets as nodes and edges capturing edit/delta information. One central variant we study is MinSum Retrieval (MSR) where the goal is to minimize the total retrieval costs, while keeping the storage costs bounded. This problem (along with its variants) was introduced by Bhattacherjee et al. [VLDB'15]. While such problems are frequently encountered in collaborative tools (e.g., version control systems and data analysis pipelines), to the best of our knowledge, no existing research studies the theoretical aspects of these problems.
We establish that the currently best-known heuristic, LMG, can perform arbitrarily badly in a simple worst case. Moreover, we show that it is hard to get $o(n)$-approximation for MSR on general graphs even if we relax the storage constraints by an $O(\log n)$ factor. Similar hardness results are shown for other variants. Meanwhile, we propose poly-time approximation schemes for tree-like graphs, motivated by the fact that the graphs arising in practice from typical edit operations are often not arbitrary. As version graphs typically have low treewidth, we further develop new algorithms for bounded treewidth graphs.
Furthermore, we propose two new heuristics and evaluate them empirically. First, we extend LMG by considering more potential ``moves'', to propose a new heuristic LMG-All. LMG-All consistently outperforms LMG while having comparable run time on a wide variety of datasets, i.e., version graphs. Secondly, we apply our tree algorithms on the minimum-storage arborescence of an instance, yielding algorithms that are qualitatively better than all previous heuristics for MSR, as well as for another variant BoundedMin Retrieval (BMR).
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
NICE: To Optimize In-Context Examples or Not?
Authors:
Pragya Srivastava,
Satvik Golechha,
Amit Deshpande,
Amit Sharma
Abstract:
Recent work shows that in-context learning and optimization of in-context examples (ICE) can significantly improve the accuracy of large language models (LLMs) on a wide range of tasks, leading to an apparent consensus that ICE optimization is crucial for better performance. However, most of these studies assume a fixed or no instruction provided in the prompt. We challenge this consensus by inves…
▽ More
Recent work shows that in-context learning and optimization of in-context examples (ICE) can significantly improve the accuracy of large language models (LLMs) on a wide range of tasks, leading to an apparent consensus that ICE optimization is crucial for better performance. However, most of these studies assume a fixed or no instruction provided in the prompt. We challenge this consensus by investigating the necessity of optimizing ICE when task-specific instructions are provided and find that there are many tasks for which it yields diminishing returns. In particular, using a diverse set of tasks and a systematically created instruction set with gradually added details, we find that as the prompt instruction becomes more detailed, the returns on ICE optimization diminish. To characterize this behavior, we introduce a task-specific metric called Normalized Invariability to Choice of Examples (NICE) that quantifies the learnability of tasks from a given instruction, and provides a heuristic to help decide whether to optimize instructions or ICE for a new task. Given a task, the proposed metric can reliably predict the utility of optimizing ICE compared to using random ICE. Our code is available at https://github.com/microsoft/nice-icl.
△ Less
Submitted 6 June, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
Authors:
Gang Liao,
Amol Deshpande,
Daniel J. Abadi
Abstract:
Existing serverless data analytics systems rely on external storage services like S3 for data shuffling and communication between cloud functions. While this approach provides the elasticity benefits of serverless computing, it incurs additional latency and cost overheads. We present Flock, a novel cloud-native streaming query engine that leverages the on-demand scalability of FaaS platforms for r…
▽ More
Existing serverless data analytics systems rely on external storage services like S3 for data shuffling and communication between cloud functions. While this approach provides the elasticity benefits of serverless computing, it incurs additional latency and cost overheads. We present Flock, a novel cloud-native streaming query engine that leverages the on-demand scalability of FaaS platforms for real-time data analytics. Flock utilizes function invocation payloads for efficient data exchange, eliminating the need for external storage. This not only reduces latency and cost but also simplifies the architecture by removing the requirement for a centralized coordinator. Flock employs a template-based approach to dynamically create cloud functions for each query stage and a function group mechanism for handling data aggregation and shuffling. It supports both SQL and DataFrame APIs, making it easy to use. Our evaluation shows that Flock provides significant performance gains and cost savings compared to existing serverless and serverful streaming systems. It outperforms Apache Flink by 10-20x in cost while achieving similar latency and throughput.
△ Less
Submitted 21 April, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Rethinking Robustness of Model Attributions
Authors:
Sandesh Kamath,
Sankalp Mittal,
Amit Deshpande,
Vineeth N Balasubramanian
Abstract:
For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile…
▽ More
For machine learning models to be reliable and trustworthy, their decisions must be interpretable. As these models find increasing use in safety-critical applications, it is important that not just the model predictions but also their explanations (as feature attributions) be robust to small human-imperceptible input perturbations. Recent works have shown that many attribution methods are fragile and have proposed improvements in either these methods or the model training. We observe two main causes for fragile attributions: first, the existing metrics of robustness (e.g., top-k intersection) over-penalize even reasonable local shifts in attribution, thereby making random perturbations to appear as a strong attack, and second, the attribution can be concentrated in a small region even when there are multiple important parts in an image. To rectify this, we propose simple ways to strengthen existing metrics and attribution methods that incorporate locality of pixels in robustness metrics and diversity of pixel locations in attributions. Towards the role of model training in attributional robustness, we empirically observe that adversarially trained models have more robust attributions on smaller datasets, however, this advantage disappears in larger datasets. Code is available at https://github.com/ksandeshk/LENS.
△ Less
Submitted 16 December, 2023;
originally announced December 2023.
-
How Far Can Fairness Constraints Help Recover From Biased Data?
Authors:
Mohit Sharma,
Amit Deshpande
Abstract:
A general belief in fair classification is that fairness constraints incur a trade-off with accuracy, which biased data may worsen. Contrary to this belief, Blum & Stangl (2019) show that fair classification with equal opportunity constraints even on extremely biased data can recover optimally accurate and fair classifiers on the original data distribution. Their result is interesting because it d…
▽ More
A general belief in fair classification is that fairness constraints incur a trade-off with accuracy, which biased data may worsen. Contrary to this belief, Blum & Stangl (2019) show that fair classification with equal opportunity constraints even on extremely biased data can recover optimally accurate and fair classifiers on the original data distribution. Their result is interesting because it demonstrates that fairness constraints can implicitly rectify data bias and simultaneously overcome a perceived fairness-accuracy trade-off. Their data bias model simulates under-representation and label bias in underprivileged population, and they show the above result on a stylized data distribution with i.i.d. label noise, under simple conditions on the data distribution and bias parameters. We propose a general approach to extend the result of Blum & Stangl (2019) to different fairness constraints, data bias models, data distributions, and hypothesis classes. We strengthen their result, and extend it to the case when their stylized distribution has labels with Massart noise instead of i.i.d. noise. We prove a similar recovery result for arbitrary data distributions using fair reject option classifiers. We further generalize it to arbitrary data distributions and arbitrary hypothesis classes, i.e., we prove that for any data distribution, if the optimally accurate classifier in a given hypothesis class is fair and robust, then it can be recovered through fair classification with equal opportunity constraints on the biased distribution whenever the bias parameters satisfy certain simple conditions. Finally, we show applications of our technique to time-varying data bias in classification and fair machine learning pipelines.
△ Less
Submitted 1 June, 2024; v1 submitted 16 December, 2023;
originally announced December 2023.
-
BaRiFlex: A Robotic Gripper with Versatility and Collision Robustness for Robot Learning
Authors:
Gu-Cheol Jeong,
Arpit Bahety,
Gabriel Pedraza,
Ashish D. Deshpande,
Roberto Martín-Martín
Abstract:
We present a new approach to robot hand design specifically suited for successfully implementing robot learning methods to accomplish tasks in daily human environments. We introduce BaRiFlex, an innovative gripper design that alleviates the issues caused by unexpected contact and collisions during robot learning, offering robustness, grasping versatility, task versatility, and simplicity to the le…
▽ More
We present a new approach to robot hand design specifically suited for successfully implementing robot learning methods to accomplish tasks in daily human environments. We introduce BaRiFlex, an innovative gripper design that alleviates the issues caused by unexpected contact and collisions during robot learning, offering robustness, grasping versatility, task versatility, and simplicity to the learning processes. This achievement is enabled by the incorporation of low-inertia actuators, providing high Back-drivability, and the strategic combination of Rigid and Flexible materials which enhances versatility and the gripper's resilience against unpredicted collisions. Furthermore, the integration of flexible Fin-Ray linkages and rigid linkages allows the gripper to execute compliant grasping and precise pinching. We conducted rigorous performance tests to characterize the novel gripper's compliance, durability, grasping and task versatility, and precision. We also integrated the BaRiFlex with a 7 Degree of Freedom (DoF) Franka Emika's Panda robotic arm to evaluate its capacity to support a trial-and-error (reinforcement learning) training procedure. The results of our experimental study are then compared to those obtained using the original rigid Franka Hand and a reference Fin-Ray soft gripper, demonstrating the superior capabilities and advantages of our developed gripper system.
△ Less
Submitted 8 December, 2023;
originally announced December 2023.
-
Energy-Efficient Internet of Things Monitoring with Content-Based Wake-Up Radio
Authors:
Anay Ajit Deshpande,
Federico Chiariotti,
Andrea Zanella
Abstract:
The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled. However, polling-based WUR may still lead to wasted energy if values sensed by the polled sensors provide no new information to the…
▽ More
The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled. However, polling-based WUR may still lead to wasted energy if values sensed by the polled sensors provide no new information to the receiver, or in general have a low Value of Information (VoI). In this paper, we design a content-based WUR that tracks the process observed by the sensors and only wakes up the sensor if its estimated update's VoI is higher than a threshold communicated through the poll. If the sensor does not reply to the polling request, the Gateway (GW) can make a Bayesian update, knowing that either the sensor value substantially confirms its current estimate or the transmission failed due to the wireless channel. We analyze the trade-off between the tracking error and the battery lifetime of the sensors, showing that content-based WUR can provide fine-grained control of this trade-off and significantly increase the battery lifetime of the node with a minimal Mean Squared Error (MSE) increase.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Student Activity Recognition in Classroom Environments using Transfer Learning
Authors:
Anagha Deshpande,
Vedant Deshpande
Abstract:
The recent advances in artificial intelligence and deep learning facilitate automation in various applications including home automation, smart surveillance systems, and healthcare among others. Human Activity Recognition is one of its emerging applications, which can be implemented in a classroom environment to enhance safety, efficiency, and overall educational quality. This paper proposes a sys…
▽ More
The recent advances in artificial intelligence and deep learning facilitate automation in various applications including home automation, smart surveillance systems, and healthcare among others. Human Activity Recognition is one of its emerging applications, which can be implemented in a classroom environment to enhance safety, efficiency, and overall educational quality. This paper proposes a system for detecting and recognizing the activities of students in a classroom environment. The dataset has been structured and recorded by the authors since a standard dataset for this task was not available at the time of this study. Transfer learning, a widely adopted method within the field of deep learning, has proven to be helpful in complex tasks like image and video processing. Pretrained models including VGG-16, ResNet-50, InceptionV3, and Xception are used for feature extraction and classification tasks. Xception achieved an accuracy of 93%, on the novel classroom dataset, outperforming the other three models in consideration. The system proposed in this study aims to introduce a safer and more productive learning environment for students and educators.
△ Less
Submitted 30 November, 2023;
originally announced December 2023.
-
GEO: Generative Engine Optimization
Authors:
Pranjal Aggarwal,
Vishvak Murahari,
Tanmay Rajpurohit,
Ashwin Kalyan,
Karthik Narasimhan,
Ameet Deshpande
Abstract:
The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Gen…
▽ More
The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of generative engines (GEs), can generate accurate and personalized responses, rapidly replacing traditional search engines like Google and Bing. Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them using LLMs. While this shift significantly improves $\textit{user}$ utility and $\textit{generative search engine}$ traffic, it poses a huge challenge for the third stakeholder -- website and content creators. Given the black-box and fast-moving nature of generative engines, content creators have little to no control over $\textit{when}$ and $\textit{how}$ their content is displayed. With generative engines here to stay, we must ensure the creator economy is not disadvantaged. To address this, we introduce Generative Engine Optimization (GEO), the first novel paradigm to aid content creators in improving their content visibility in generative engine responses through a flexible black-box optimization framework for optimizing and defining visibility metrics. We facilitate systematic evaluation by introducing GEO-bench, a large-scale benchmark of diverse user queries across multiple domains, along with relevant web sources to answer these queries. Through rigorous evaluation, we demonstrate that GEO can boost visibility by up to $40\%$ in generative engine responses. Moreover, we show the efficacy of these strategies varies across domains, underscoring the need for domain-specific optimization methods. Our work opens a new frontier in information discovery systems, with profound implications for both developers of generative engines and content creators.
△ Less
Submitted 28 June, 2024; v1 submitted 16 November, 2023;
originally announced November 2023.
-
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Authors:
Shashank Gupta,
Vaishnavi Shrivastava,
Ameet Deshpande,
Ashwin Kalyan,
Peter Clark,
Ashish Sabharwal,
Tushar Khot
Abstract:
Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs' capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of…
▽ More
Recent works have showcased the ability of LLMs to embody diverse personas in their responses, exemplified by prompts like 'You are Yoda. Explain the Theory of Relativity.' While this ability allows personalization of LLMs and enables human behavior simulation, its effect on LLMs' capabilities remains unclear. To fill this gap, we present the first extensive study of the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups. Our experiments unveil that LLMs harbor deep rooted bias against various socio-demographics underneath a veneer of fairness. While they overtly reject stereotypes when explicitly asked ('Are Black people less skilled at mathematics?'), they manifest stereotypical and erroneous presumptions when asked to answer questions while adopting a persona. These can be observed as abstentions in responses, e.g., 'As a Black person, I can't answer this question as it requires math knowledge', and generally result in a substantial performance drop. Our experiments with ChatGPT-3.5 show that this bias is ubiquitous - 80% of our personas demonstrate bias; it is significant - some datasets show performance drops of 70%+; and can be especially harmful for certain groups - some personas suffer statistically significant drops on 80%+ of the datasets. Overall, all 4 LLMs exhibit this bias to varying extents, with GPT-4-Turbo showing the least but still a problematic amount of bias (evident in 42% of the personas). Further analysis shows that these persona-induced errors can be hard-to-discern and hard-to-avoid. Our findings serve as a cautionary tale that the practice of assigning personas to LLMs - a trend on the rise - can surface their deep-rooted biases and have unforeseeable and detrimental side-effects.
△ Less
Submitted 27 January, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
QualEval: Qualitative Evaluation for Model Improvement
Authors:
Vishvak Murahari,
Ameet Deshpande,
Peter Clark,
Tanmay Rajpurohit,
Ashish Sabharwal,
Karthik Narasimhan,
Ashwin Kalyan
Abstract:
Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a…
▽ More
Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a way to compare and benchmark models, and do not yield actionable diagnostics, thus making the model improvement process challenging. Model developers find themselves amid extensive manual efforts involving sifting through vast datasets and attempting hit-or-miss adjustments to training data or setups. In this work, we address the shortcomings of quantitative metrics by proposing QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights that when applied, accelerate model improvement. The insights are backed by a comprehensive dashboard with fine-grained visualizations and human-interpretable analyses. We corroborate the faithfulness of QualEval by demonstrating that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative on a challenging dialogue task (DialogSum) when compared to baselines. QualEval successfully increases the pace of model development, thus in essence serving as a data-scientist-in-a-box. Given the focus on critiquing and improving current evaluation metrics, our method serves as a refreshingly new technique for both model evaluation and improvement.
△ Less
Submitted 5 May, 2024; v1 submitted 5 November, 2023;
originally announced November 2023.
-
CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning
Authors:
Liyiming Ke,
Yunchu Zhang,
Abhay Deshpande,
Siddhartha Srinivasa,
Abhishek Gupta
Abstract:
We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage…
▽ More
We present a new technique to enhance the robustness of imitation learning methods by generating corrective data to account for compounding errors and disturbances. While existing methods rely on interactive expert labeling, additional offline datasets, or domain-specific invariances, our approach requires minimal additional assumptions beyond access to expert data. The key insight is to leverage local continuity in the environment dynamics to generate corrective labels. Our method first constructs a dynamics model from the expert demonstration, encouraging local Lipschitz continuity in the learned model. In locally continuous regions, this model allows us to generate corrective labels within the neighborhood of the demonstrations but beyond the actual set of states and actions in the dataset. Training on this augmented data enhances the agent's ability to recover from perturbations and deal with compounding errors. We demonstrate the effectiveness of our generated labels through experiments in a variety of robotics domains in simulation that have distinct forms of continuity and discontinuity, including classic control problems, drone flying, navigation with high-dimensional sensor observations, legged locomotion, and tabletop manipulation.
△ Less
Submitted 3 June, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
Key-phrase boosted unsupervised summary generation for FinTech organization
Authors:
Aadit Deshpande,
Shreya Goyal,
Prateek Nagwanshi,
Avinash Tripathy
Abstract:
With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can he…
▽ More
With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can help FinTech organizations to utilize the social media language data to find useful external insights and can be further utilized for downstream NLP tasks. Particularly, a summary which highlights the intents and sentiments of the users can be very useful for these organizations to get an external perspective. This external perspective can help organizations to better manage their products, offers, promotional campaigns, etc. However, certain challenges, such as a lack of labeled domain-specific datasets impede further exploration of these tasks in the FinTech domain. To overcome these challenges, we design an unsupervised phrase-based summary generation from social media data, using 'Action-Object' pairs (intent phrases). We evaluated the proposed method with other key-phrase based summary generation methods in the direction of contextual information of various Reddit discussion threads, available in the different summaries. We introduce certain "Context Metrics" such as the number of Unique words, Action-Object pairs, and Noun chunks to evaluate the contextual information retrieved from the source text in these phrase-based summaries. We demonstrate that our methods significantly outperform the baseline on these metrics, thus providing a qualitative and quantitative measure of their efficacy. Proposed framework has been leveraged as a web utility portal hosted within Amex.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
FiGURe: Simple and Efficient Unsupervised Node Representations with Filter Augmentations
Authors:
Chanakya Ekbote,
Ajinkya Pankaj Deshpande,
Arun Iyer,
Ramakrishna Bairi,
Sundararajan Sellamanickam
Abstract:
Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We sh…
▽ More
Unsupervised node representations learnt using contrastive learning-based methods have shown good performance on downstream tasks. However, these methods rely on augmentations that mimic low-pass filters, limiting their performance on tasks requiring different eigen-spectrum parts. This paper presents a simple filter-based augmentation method to capture different parts of the eigen-spectrum. We show significant improvements using these augmentations. Further, we show that sharing the same weights across these different filter augmentations is possible, reducing the computational load. In addition, previous works have shown that good performance on downstream tasks requires high dimensional representations. Working with high dimensions increases the computations, especially when multiple augmentations are involved. We mitigate this problem and recover good performance through lower dimensional embeddings using simple random Fourier feature projections. Our method, FiGURe achieves an average gain of up to 4.4%, compared to the state-of-the-art unsupervised models, across all datasets in consideration, both homophilic and heterophilic. Our code can be found at: https://github.com/microsoft/figure.
△ Less
Submitted 4 October, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
PBP: Path-based Trajectory Prediction for Autonomous Driving
Authors:
Sepideh Afshar,
Nachiket Deo,
Akshay Bhagat,
Titas Chakraborty,
Yunming Shao,
Balarama Raju Buddharaju,
Adwait Deshpande,
Henggang Cui
Abstract:
Trajectory prediction plays a crucial role in the autonomous driving stack by enabling autonomous vehicles to anticipate the motion of surrounding agents. Goal-based prediction models have gained traction in recent years for addressing the multimodal nature of future trajectories. Goal-based prediction models simplify multimodal prediction by first predicting 2D goal locations of agents and then p…
▽ More
Trajectory prediction plays a crucial role in the autonomous driving stack by enabling autonomous vehicles to anticipate the motion of surrounding agents. Goal-based prediction models have gained traction in recent years for addressing the multimodal nature of future trajectories. Goal-based prediction models simplify multimodal prediction by first predicting 2D goal locations of agents and then predicting trajectories conditioned on each goal. However, a single 2D goal location serves as a weak inductive bias for predicting the whole trajectory, often leading to poor map compliance, i.e., part of the trajectory going off-road or breaking traffic rules. In this paper, we improve upon goal-based prediction by proposing the Path-based prediction (PBP) approach. PBP predicts a discrete probability distribution over reference paths in the HD map using the path features and predicts trajectories in the path-relative Frenet frame. We applied the PBP trajectory decoder on top of the HiVT scene encoder and report results on the Argoverse dataset. Our experiments show that PBP achieves competitive performance on the standard trajectory prediction metrics, while significantly outperforming state-of-the-art baselines in terms of map compliance.
△ Less
Submitted 2 March, 2024; v1 submitted 7 September, 2023;
originally announced September 2023.
-
Improved Outlier Robust Seeding for k-means
Authors:
Amit Deshpande,
Rameshwar Pratap
Abstract:
The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$ approximation guarantee \cite{AV2007}. However, in the presence of adversarial noise or outliers, $D^{2}$ sampling is more likely to pick centers from distant outliers in…
▽ More
The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$ approximation guarantee \cite{AV2007}. However, in the presence of adversarial noise or outliers, $D^{2}$ sampling is more likely to pick centers from distant outliers instead of inlier clusters, and therefore its approximation guarantees \textit{w.r.t.} $k$-means solution on inliers, does not hold.
Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the $D^2$ sampling distribution, which makes it robust to the outliers. Our algorithm runs in $O(ndk)$ time, outputs $O(k)$ clusters, discards marginally more points than the optimal number of outliers, and comes with a provable $O(1)$ approximation guarantee.
Our algorithm can also be modified to output exactly $k$ clusters instead of $O(k)$ clusters, while keeping its running time linear in $n$ and $d$. This is an improvement over previous results for robust $k$-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust $k$-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over $k$-means++~\cite{AV2007}, uniform random seeding, greedy sampling for $k$ means~\cite{tkmeanspp}, and robust $k$-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of $k$-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
Distraction-free Embeddings for Robust VQA
Authors:
Atharvan Dogra,
Deeksha Varshney,
Ashwin Kalyan,
Ameet Deshpande,
Neeraj Kumar
Abstract:
The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA). However, most existing methods for VLU focus on sparsely sampling or fine-graining the input information (e.g., sampling a sparse set of frames or text tokens), or add…
▽ More
The generation of effective latent representations and their subsequent refinement to incorporate precise information is an essential prerequisite for Vision-Language Understanding (VLU) tasks such as Video Question Answering (VQA). However, most existing methods for VLU focus on sparsely sampling or fine-graining the input information (e.g., sampling a sparse set of frames or text tokens), or adding external knowledge. We present a novel "DRAX: Distraction Removal and Attended Cross-Alignment" method to rid our cross-modal representations of distractors in the latent space. We do not exclusively confine the perception of any input information from various modalities but instead use an attention-guided distraction removal method to increase focus on task-relevant information in latent embeddings. DRAX also ensures semantic alignment of embeddings during cross-modal fusions. We evaluate our approach on a challenging benchmark (SUTD-TrafficQA dataset), testing the framework's abilities for feature and event queries, temporal relation understanding, forecasting, hypothesis, and causal analysis through extensive experiments.
△ Less
Submitted 31 August, 2023;
originally announced September 2023.
-
Optimizing Group-Fair Plackett-Luce Ranking Models for Relevance and Ex-Post Fairness
Authors:
Sruthi Gorantla,
Eshaan Bhansali,
Amit Deshpande,
Anand Louis
Abstract:
In learning-to-rank (LTR), optimizing only the relevance (or the expected ranking utility) can cause representational harm to certain categories of items. Moreover, if there is implicit bias in the relevance scores, LTR models may fail to optimize for true relevance. Previous works have proposed efficient algorithms to train stochastic ranking models that achieve fairness of exposure to the groups…
▽ More
In learning-to-rank (LTR), optimizing only the relevance (or the expected ranking utility) can cause representational harm to certain categories of items. Moreover, if there is implicit bias in the relevance scores, LTR models may fail to optimize for true relevance. Previous works have proposed efficient algorithms to train stochastic ranking models that achieve fairness of exposure to the groups ex-ante (or, in expectation), which may not guarantee representation fairness to the groups ex-post, that is, after realizing a ranking from the stochastic ranking model. Typically, ex-post fairness is achieved by post-processing, but previous work does not train stochastic ranking models that are aware of this post-processing.
In this paper, we propose a novel objective that maximizes expected relevance only over those rankings that satisfy given representation constraints to ensure ex-post fairness. Building upon recent work on an efficient sampler for ex-post group-fair rankings, we propose a group-fair Plackett-Luce model and show that it can be efficiently optimized for our objective in the LTR framework.
Experiments on three real-world datasets show that our group-fair algorithm guarantees fairness alongside usually having better relevance compared to the LTR baselines. In addition, our algorithm also achieves better relevance than post-processing baselines, which also ensures ex-post fairness. Further, when implicit bias is injected into the training data, our algorithm typically outperforms existing LTR baselines in relevance.
△ Less
Submitted 25 August, 2023;
originally announced August 2023.
-
Low-Latency Massive Access with Multicast Wake Up Radio
Authors:
Anay Ajit Deshpande,
Federico Chiariotti,
Andrea Zanella
Abstract:
The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled, saving energy. However, polling-based Time Division Multiple Access (TDMA) may significantly increase data transmission delay if pac…
▽ More
The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled, saving energy. However, polling-based Time Division Multiple Access (TDMA) may significantly increase data transmission delay if packets are generated sporadically, as nodes with no information still need to be polled. In this paper, we examine the effect of multicast polling for WUR-enabled wireless nodes. The idea is to assign nodes to multicast groups so that all nodes in the same group can be solicited by a multicast polling message. This may cause collisions, which can be solved by requesting retransmissions from the involved nodes. We analyze the performance of different multicast polling and retransmission strategies, showing that the optimal approach can significantly reduce the delay over TDMA and ALOHA in low-traffic scenarios while keeping good energy efficiency.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Artificial Intelligence for the Electron Ion Collider (AI4EIC)
Authors:
C. Allaire,
R. Ammendola,
E. -C. Aschenauer,
M. Balandat,
M. Battaglieri,
J. Bernauer,
M. Bondì,
N. Branson,
T. Britton,
A. Butter,
I. Chahrour,
P. Chatagnon,
E. Cisbani,
E. W. Cline,
S. Dash,
C. Dean,
W. Deconinck,
A. Deshpande,
M. Diefenthaler,
R. Ent,
C. Fanelli,
M. Finger,
M. Finger, Jr.,
E. Fol,
S. Furletov
, et al. (70 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took…
▽ More
The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. This paper summarizes the different activities and R&D projects covered across the sessions of the workshop and provides an overview of the goals, approaches and strategies regarding AI/ML in the EIC community, as well as cutting-edge techniques currently studied in other experiments.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
InstructEval: Systematic Evaluation of Instruction Selection Methods
Authors:
Anirudh Ajith,
Chris Pan,
Mengzhou Xia,
Ameet Deshpande,
Karthik Narasimhan
Abstract:
In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL prompt significantly impact performance, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplor…
▽ More
In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL prompt significantly impact performance, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses restricted to shallow subsets of models and tasks, limiting the generalizability of their insights. We develop InstructEval, an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from four model families, and covers nine tasks across three categories. Using the suite, we evaluate the relative performance of seven popular instruction selection methods over five metrics relevant to ICL. Our experiments reveal that using curated manually-written instructions or simple instructions without any task-specific descriptions often elicits superior ICL performance overall than that of automatic instruction-induction methods, pointing to a lack of generalizability among the latter. We release our evaluation suite for benchmarking instruction selection approaches and enabling more generalizable methods in this space.
△ Less
Submitted 16 July, 2023; v1 submitted 1 July, 2023;
originally announced July 2023.
-
Sampling Individually-Fair Rankings that are Always Group Fair
Authors:
Sruthi Gorantla,
Anay Mehrotra,
Amit Deshpande,
Anand Louis
Abstract:
Rankings on online platforms help their end-users find the relevant information -- people, news, media, and products -- quickly. Fair ranking tasks, which ask to rank a set of items to maximize utility subject to satisfying group-fairness constraints, have gained significant interest in the Algorithmic Fairness, Information Retrieval, and Machine Learning literature. Recent works, however, identif…
▽ More
Rankings on online platforms help their end-users find the relevant information -- people, news, media, and products -- quickly. Fair ranking tasks, which ask to rank a set of items to maximize utility subject to satisfying group-fairness constraints, have gained significant interest in the Algorithmic Fairness, Information Retrieval, and Machine Learning literature. Recent works, however, identify uncertainty in the utilities of items as a primary cause of unfairness and propose introducing randomness in the output. This randomness is carefully chosen to guarantee an adequate representation of each item (while accounting for the uncertainty). However, due to this randomness, the output rankings may violate group fairness constraints. We give an efficient algorithm that samples rankings from an individually-fair distribution while ensuring that every output ranking is group fair. The expected utility of the output ranking is at least $α$ times the utility of the optimal fair solution. Here, $α$ depends on the utilities, position-discounts, and constraints -- it approaches 1 as the range of utilities or the position-discounts shrinks, or when utilities satisfy distributional assumptions. Empirically, we observe that our algorithm achieves individual and group fairness and that Pareto dominates the state-of-the-art baselines.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
Causal Effect Regularization: Automated Detection and Removal of Spurious Attributes
Authors:
Abhinav Kumar,
Amit Deshpande,
Amit Sharma
Abstract:
In many classification datasets, the task labels are spuriously correlated with some input attributes. Classifiers trained on such datasets often rely on these attributes for prediction, especially when the spurious correlation is high, and thus fail to generalize whenever there is a shift in the attributes' correlation at deployment. If we assume that the spurious attributes are known a priori, s…
▽ More
In many classification datasets, the task labels are spuriously correlated with some input attributes. Classifiers trained on such datasets often rely on these attributes for prediction, especially when the spurious correlation is high, and thus fail to generalize whenever there is a shift in the attributes' correlation at deployment. If we assume that the spurious attributes are known a priori, several methods have been proposed to learn a classifier that is invariant to the specified attributes. However, in real-world data, information about spurious attributes is typically unavailable. Therefore, we propose a method to automatically identify spurious attributes by estimating their causal effect on the label and then use a regularization objective to mitigate the classifier's reliance on them. Compared to a recent method for identifying spurious attributes, we find that our method is more accurate in removing the attribute from the learned model, especially when spurious correlation is high. Specifically, across synthetic, semi-synthetic, and real-world datasets, our method shows significant improvement in a metric used to quantify the dependence of a classifier on spurious attributes ($Δ$Prob), while obtaining better or similar accuracy. In addition, our method mitigates the reliance on spurious attributes even under noisy estimation of causal effects. To explain the empirical robustness of our method, we create a simple linear classification task with two sets of attributes: causal and spurious. We prove that our method only requires that the ranking of estimated causal effects is correct across attributes to select the correct classifier.
△ Less
Submitted 7 December, 2023; v1 submitted 19 June, 2023;
originally announced June 2023.
-
C-STS: Conditional Semantic Textual Similarity
Authors:
Ameet Deshpande,
Carlos E. Jimenez,
Howard Chen,
Vishvak Murahari,
Victoria Graf,
Tanmay Rajpurohit,
Ashwin Kalyan,
Danqi Chen,
Karthik Narasimhan
Abstract:
Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding. However, sentence similarity can be inherently ambiguous, depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called Conditiona…
▽ More
Semantic textual similarity (STS), a cornerstone task in NLP, measures the degree of similarity between a pair of sentences, and has broad application in fields such as information retrieval and natural language understanding. However, sentence similarity can be inherently ambiguous, depending on the specific aspect of interest. We resolve this ambiguity by proposing a novel task called Conditional STS (C-STS) which measures sentences' similarity conditioned on an feature described in natural language (hereon, condition). As an example, the similarity between the sentences "The NBA player shoots a three-pointer." and "A man throws a tennis ball into the air to serve." is higher for the condition "The motion of the ball" (both upward) and lower for "The size of the ball" (one large and one small). C-STS's advantages are two-fold: (1) it reduces the subjectivity and ambiguity of STS and (2) enables fine-grained language model evaluation through diverse natural language conditions. We put several state-of-the-art models to the test, and even those performing well on STS (e.g. SimCSE, Flan-T5, and GPT-4) find C-STS challenging; all with Spearman correlation scores below 50. To encourage a more comprehensive evaluation of semantic similarity and natural language understanding, we make nearly 19K C-STS examples and code available for others to train and test their models.
△ Less
Submitted 6 November, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Anthropomorphization of AI: Opportunities and Risks
Authors:
Ameet Deshpande,
Tanmay Rajpurohit,
Karthik Narasimhan,
Ashwin Kalyan
Abstract:
Anthropomorphization is the tendency to attribute human-like traits to non-human entities. It is prevalent in many social contexts -- children anthropomorphize toys, adults do so with brands, and it is a literary device. It is also a versatile tool in science, with behavioral psychology and evolutionary biology meticulously documenting its consequences. With widespread adoption of AI systems, and…
▽ More
Anthropomorphization is the tendency to attribute human-like traits to non-human entities. It is prevalent in many social contexts -- children anthropomorphize toys, adults do so with brands, and it is a literary device. It is also a versatile tool in science, with behavioral psychology and evolutionary biology meticulously documenting its consequences. With widespread adoption of AI systems, and the push from stakeholders to make it human-like through alignment techniques, human voice, and pictorial avatars, the tendency for users to anthropomorphize it increases significantly. We take a dyadic approach to understanding this phenomenon with large language models (LLMs) by studying (1) the objective legal implications, as analyzed through the lens of the recent blueprint of AI bill of rights and the (2) subtle psychological aspects customization and anthropomorphization. We find that anthropomorphized LLMs customized for different user bases violate multiple provisions in the legislative blueprint. In addition, we point out that anthropomorphization of LLMs affects the influence they can have on their users, thus having the potential to fundamentally change the nature of human-AI interaction, with potential for manipulation and negative influence. With LLMs being hyper-personalized for vulnerable groups like children and patients among others, our work is a timely and important contribution. We propose a conservative strategy for the cautious use of anthropomorphization to improve trustworthiness of AI systems.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Authors:
Ameet Deshpande,
Vishvak Murahari,
Tanmay Rajpurohit,
Ashwin Kalyan,
Karthik Narasimhan
Abstract:
Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a…
▽ More
Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a clear understanding of the capabilities and limitations of LLMs is necessary. To this end, we systematically evaluate toxicity in over half a million generations of ChatGPT, a popular dialogue-based LLM. We find that setting the system parameter of ChatGPT by assigning it a persona, say that of the boxer Muhammad Ali, significantly increases the toxicity of generations. Depending on the persona assigned to ChatGPT, its toxicity can increase up to 6x, with outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions. This may be potentially defamatory to the persona and harmful to an unsuspecting user. Furthermore, we find concerning patterns where specific entities (e.g., certain races) are targeted more than others (3x more) irrespective of the assigned persona, that reflect inherent discriminatory biases in the model. We hope that our findings inspire the broader AI community to rethink the efficacy of current safety guardrails and develop better techniques that lead to robust, safe, and trustworthy AI systems.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Cherry-Picking with Reinforcement Learning : Robust Dynamic Grasping in Unstable Conditions
Authors:
Yunchu Zhang,
Liyiming Ke,
Abhay Deshpande,
Abhishek Gupta,
Siddhartha Srinivasa
Abstract:
Grasping small objects surrounded by unstable or non-rigid material plays a crucial role in applications such as surgery, harvesting, construction, disaster recovery, and assisted feeding. This task is especially difficult when fine manipulation is required in the presence of sensor noise and perception errors; errors inevitably trigger dynamic motion, which is challenging to model precisely. Circ…
▽ More
Grasping small objects surrounded by unstable or non-rigid material plays a crucial role in applications such as surgery, harvesting, construction, disaster recovery, and assisted feeding. This task is especially difficult when fine manipulation is required in the presence of sensor noise and perception errors; errors inevitably trigger dynamic motion, which is challenging to model precisely. Circumventing the difficulty to build accurate models for contacts and dynamics, data-driven methods like reinforcement learning (RL) can optimize task performance via trial and error, reducing the need for accurate models of contacts and dynamics. Applying RL methods to real robots, however, has been hindered by factors such as prohibitively high sample complexity or the high training infrastructure cost for providing resets on hardware. This work presents CherryBot, an RL system that uses chopsticks for fine manipulation that surpasses human reactiveness for some dynamic grasping tasks. By integrating imprecise simulators, suboptimal demonstrations and external state estimation, we study how to make a real-world robot learning system sample efficient and general while reducing the human effort required for supervision. Our system shows continual improvement through 30 minutes of real-world interaction: through reactive retry, it achieves an almost 100% success rate on the demanding task of using chopsticks to grasp small objects swinging in the air. We demonstrate the reactiveness, robustness and generalizability of CherryBot to varying object shapes and dynamics (e.g., external disturbances like wind and human perturbations). Videos are available at https://goodcherrybot.github.io/.
△ Less
Submitted 28 June, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
DeepCPG Policies for Robot Locomotion
Authors:
Aditya M. Deshpande,
Eric Hurd,
Ali A. Minai,
Manish Kumar
Abstract:
Central Pattern Generators (CPGs) form the neural basis of the observed rhythmic behaviors for locomotion in legged animals. The CPG dynamics organized into networks allow the emergence of complex locomotor behaviors. In this work, we take this inspiration for developing walking behaviors in multi-legged robots. We present novel DeepCPG policies that embed CPGs as a layer in a larger neural networ…
▽ More
Central Pattern Generators (CPGs) form the neural basis of the observed rhythmic behaviors for locomotion in legged animals. The CPG dynamics organized into networks allow the emergence of complex locomotor behaviors. In this work, we take this inspiration for developing walking behaviors in multi-legged robots. We present novel DeepCPG policies that embed CPGs as a layer in a larger neural network and facilitate end-to-end learning of locomotion behaviors in deep reinforcement learning (DRL) setup. We demonstrate the effectiveness of this approach on physics engine-based insectoid robots. We show that, compared to traditional approaches, DeepCPG policies allow sample-efficient end-to-end learning of effective locomotion strategies even in the case of high-dimensional sensor spaces (vision). We scale the DeepCPG policies using a modular robot configuration and multi-agent DRL. Our results suggest that gradual complexification with embedded priors of these policies in a modular fashion could achieve non-trivial sensor and motor integration on a robot platform. These results also indicate the efficacy of bootstrapping more complex intelligent systems from simpler ones based on biological principles. Finally, we present the experimental results for a proof-of-concept insectoid robot system for which DeepCPG learned policies initially using the simulation engine and these were afterwards transferred to real-world robots without any additional fine-tuning.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Authors:
Vishvak Murahari,
Ameet Deshpande,
Carlos E. Jimenez,
Izhak Shafran,
Mingqiu Wang,
Yuan Cao,
Karthik Narasimhan
Abstract:
The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms…
▽ More
The widespread adoption of large language models such as ChatGPT and Bard has led to unprecedented demand for these technologies. The burgeoning cost of inference for ever-increasing model sizes coupled with hardware shortages has limited affordable access and poses a pressing need for efficiency approaches geared towards high throughput and performance. Multi-input multi-output (MIMO) algorithms such as data multiplexing, offer a promising solution with a many-fold increase in throughput by performing inference for multiple inputs at the cost of a single input. Yet these approaches are not currently performant enough to be deployed in modern systems. We change that by developing MUX-PLMs, a class of high throughput pre-trained language models (PLMs) trained with data multiplexing, that can be fine-tuned for any downstream task to yield high-throughput high-performance. Our novel multiplexing and demultiplexing modules proficiently entangle and disentangle inputs, and enable high-performance high throughput \muxplms{} that are competitive with vanilla PLMs while achieving 2x/5x inference speedup with only a $1-4\%$ drop on a broad suite of tasks.
△ Less
Submitted 22 May, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
On Comparing Fair Classifiers under Data Bias
Authors:
Mohit Sharma,
Amit Deshpande,
Rajiv Ratn Shah
Abstract:
In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-…
▽ More
In this paper, we consider a theoretical model for injecting data bias, namely, under-representation and label bias (Blum & Stangl, 2019). We empirically study the effect of varying data biases on the accuracy and fairness of fair classifiers. Through extensive experiments on both synthetic and real-world datasets (e.g., Adult, German Credit, Bank Marketing, COMPAS), we empirically audit pre-, in-, and post-processing fair classifiers from standard fairness toolkits for their fairness and accuracy by injecting varying amounts of under-representation and label bias in their training data (but not the test data). Our main observations are: 1. The fairness and accuracy of many standard fair classifiers degrade severely as the bias injected in their training data increases, 2. A simple logistic regression model trained on the right data can often outperform, in both accuracy and fairness, most fair classifiers trained on biased training data, and 3. A few, simple fairness techniques (e.g., reweighing, exponentiated gradients) seem to offer stable accuracy and fairness guarantees even when their training data is injected with under-representation and label bias. Our experiments also show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
△ Less
Submitted 10 December, 2023; v1 submitted 12 February, 2023;
originally announced February 2023.
-
SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification
Authors:
Pranjal Aggarwal,
Ameet Deshpande,
Karthik Narasimhan
Abstract:
Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-s…
▽ More
Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data. To develop SemSup-XC, we use automatically collected semantic class descriptions to represent classes and facilitate generalization through a novel hybrid matching module that matches input instances to class descriptions using a combination of semantic and lexical similarity. Trained with contrastive learning, SemSup-XC significantly outperforms baselines and establishes state-of-the-art performance on all three datasets considered, gaining up to 12 precision points on zero-shot and more than 10 precision points on one-shot tests, with similar gains for recall@10. Our ablation studies highlight the relative importance of our hybrid matching module and automatically collected class descriptions.
△ Less
Submitted 22 June, 2023; v1 submitted 26 January, 2023;
originally announced January 2023.