-
A Framework for Testing and Adapting REST APIs as LLM Tools
Authors:
Jayachandu Bandlamudi,
Ritwik Chaudhuri,
Neelamadhav Gantayat,
Kushal Mukherjee,
Prerna Agarwal,
Renuka Sindhgatta,
Sameep Mehta
Abstract:
Large Language Models (LLMs) are enabling autonomous agents to perform complex workflows using external tools or functions, often provided via REST APIs in enterprise systems. However, directly utilizing these APIs as tools poses challenges due to their complex input schemas, elaborate responses, and often ambiguous documentation. Current benchmarks for tool testing do not adequately address these…
▽ More
Large Language Models (LLMs) are enabling autonomous agents to perform complex workflows using external tools or functions, often provided via REST APIs in enterprise systems. However, directly utilizing these APIs as tools poses challenges due to their complex input schemas, elaborate responses, and often ambiguous documentation. Current benchmarks for tool testing do not adequately address these complexities, leading to a critical gap in evaluating API readiness for agent-driven automation. In this work, we present a novel testing framework aimed at evaluating and enhancing the readiness of REST APIs to function as tools for LLM-based agents. Our framework transforms apis as tools, generates comprehensive test cases for the APIs, translates tests cases into natural language instructions suitable for agents, enriches tool definitions and evaluates the agent's ability t correctly invoke the API and process its inputs and responses. To provide actionable insights, we analyze the outcomes of 750 test cases, presenting a detailed taxonomy of errors, including input misinterpretation, output handling inconsistencies, and schema mismatches. Additionally, we classify these test cases to streamline debugging and refinement of tool integrations. This work offers a foundational step toward enabling enterprise APIs as tools, improving their usability in agent-based applications.
△ Less
Submitted 1 May, 2025; v1 submitted 21 April, 2025;
originally announced April 2025.
-
Z-REx: Human-Interpretable GNN Explanations for Real Estate Recommendations
Authors:
Kunal Mukherjee,
Zachary Harrison,
Saeid Balaneshin
Abstract:
Transparency and interpretability are crucial for enhancing customer confidence and user engagement, especially when dealing with black-box Machine Learning (ML)-based recommendation systems. Modern recommendation systems leverage Graph Neural Network (GNN) due to their ability to produce high-quality recommendations in terms of both relevance and diversity. Therefore, the explainability of GNN is…
▽ More
Transparency and interpretability are crucial for enhancing customer confidence and user engagement, especially when dealing with black-box Machine Learning (ML)-based recommendation systems. Modern recommendation systems leverage Graph Neural Network (GNN) due to their ability to produce high-quality recommendations in terms of both relevance and diversity. Therefore, the explainability of GNN is especially important for Link Prediction (LP) tasks since recommending relevant items can be viewed as predicting links between users and items. GNN explainability has been a well-studied field, existing methods primarily focus on node or graph-level tasks, leaving a gap in LP explanation techniques.
This work introduces Z-REx, a GNN explanation framework designed explicitly for heterogeneous link prediction tasks. Z-REx utilizes structural and attribute perturbation to identify critical sub-structures and important features while reducing the search space by leveraging domain-specific knowledge. In our experimentation, we show the efficacy of Z-REx in generating contextually relevant and human-interpretable explanations for ZiGNN, a GNN-based recommendation engine, using a real-world real-estate dataset from Zillow Group, Inc. We also compare Z-REx to State-of-The-Art (SOTA) GNN explainers to show Z-REx's superiority in producing high-quality human-interpretable explanations.
△ Less
Submitted 11 February, 2025;
originally announced March 2025.
-
From Selection to Generation: A Survey of LLM-based Active Learning
Authors:
Yu Xia,
Subhojyoti Mukherjee,
Zhouhang Xie,
Junda Wu,
Xintong Li,
Ryan Aponte,
Hanjia Lyu,
Joe Barrow,
Hongjie Chen,
Franck Dernoncourt,
Branislav Kveton,
Tong Yu,
Ruiyi Zhang,
Jiuxiang Gu,
Nesreen K. Ahmed,
Yu Wang,
Xiang Chen,
Hanieh Deilamsalehy,
Sungchul Kim,
Zhengmian Hu,
Yue Zhao,
Nedim Lipka,
Seunghyun Yoon,
Ting-Hao Kenneth Huang,
Zichao Wang
, et al. (9 additional authors not shown)
Abstract:
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the incre…
▽ More
Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
Humanity's Last Exam
Authors:
Long Phan,
Alice Gatti,
Ziwen Han,
Nathaniel Li,
Josephina Hu,
Hugh Zhang,
Chen Bo Calvin Zhang,
Mohamed Shaaban,
John Ling,
Sean Shi,
Michael Choi,
Anish Agrawal,
Arnav Chopra,
Adam Khoja,
Ryan Kim,
Richard Ren,
Jason Hausenloy,
Oliver Zhang,
Mantas Mazeika,
Dmitry Dodonov,
Tung Nguyen,
Jaeho Lee,
Daron Anderson,
Mikhail Doroshenko,
Alun Cennyth Stokes
, et al. (1084 additional authors not shown)
Abstract:
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of…
▽ More
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90\% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam (HLE), a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. HLE consists of 2,500 questions across dozens of subjects, including mathematics, humanities, and the natural sciences. HLE is developed globally by subject-matter experts and consists of multiple-choice and short-answer questions suitable for automated grading. Each question has a known solution that is unambiguous and easily verifiable, but cannot be quickly answered via internet retrieval. State-of-the-art LLMs demonstrate low accuracy and calibration on HLE, highlighting a significant gap between current LLM capabilities and the expert human frontier on closed-ended academic questions. To inform research and policymaking upon a clear understanding of model capabilities, we publicly release HLE at https://lastexam.ai.
△ Less
Submitted 19 April, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from Related Example Banks
Authors:
Soumya Suvra Ghosal,
Soumyabrata Pal,
Koyel Mukherjee,
Dinesh Manocha
Abstract:
Large Language Models (LLMs) have recently demonstrated impressive few-shot learning capabilities through in-context learning (ICL). However, ICL performance is highly dependent on the choice of few-shot demonstrations, making the selection of the most optimal examples a persistent research challenge. This issue is further amplified in low-resource Indic languages, where the scarcity of ground-tru…
▽ More
Large Language Models (LLMs) have recently demonstrated impressive few-shot learning capabilities through in-context learning (ICL). However, ICL performance is highly dependent on the choice of few-shot demonstrations, making the selection of the most optimal examples a persistent research challenge. This issue is further amplified in low-resource Indic languages, where the scarcity of ground-truth data complicates the selection process. In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages. PromptRefine leverages auxiliary example banks from related high-resource Indic languages and employs multi-task learning techniques to align language-specific retrievers, enabling effective cross-language retrieval. Additionally, we incorporate diversity in the selected examples to enhance generalization and reduce bias. Through comprehensive evaluations on four text generation tasks -- Cross-Lingual Question Answering, Multilingual Question Answering, Machine Translation, and Cross-Lingual Summarization using state-of-the-art LLMs such as LLAMA-3.1-8B, LLAMA-2-7B, Qwen-2-7B, and Qwen-2.5-7B, we demonstrate that PromptRefine significantly outperforms existing frameworks for retrieving examples.
△ Less
Submitted 7 December, 2024;
originally announced December 2024.
-
FiRST: Finetuning Router-Selective Transformers for Input-Adaptive Latency Reduction
Authors:
Akriti Jain,
Saransh Sharma,
Koyel Mukherjee,
Soumyabrata Pal
Abstract:
Auto-regressive Large Language Models (LLMs) demonstrate remarkable performance across different domains such as vision and language processing. However, due to sequential processing through a stack of transformer layers, autoregressive decoding faces significant computation/latency challenges, particularly in resource-constrained environments like mobile and edge devices. Existing approaches in l…
▽ More
Auto-regressive Large Language Models (LLMs) demonstrate remarkable performance across different domains such as vision and language processing. However, due to sequential processing through a stack of transformer layers, autoregressive decoding faces significant computation/latency challenges, particularly in resource-constrained environments like mobile and edge devices. Existing approaches in literature that aim to improve latency via skipping layers have two distinct flavors - 1) Early exit, and 2) Input-agnostic heuristics where tokens exit at pre-determined layers irrespective of input sequence. Both the above strategies have limitations - the former cannot be applied to handle KV Caching necessary for speed-ups in modern framework and the latter does not capture the variation in layer importance across tasks or more generally, across input sequences. To address both limitations, we propose FiRST, an algorithm that reduces inference latency by using layer-specific routers to select a subset of transformer layers adaptively for each input sequence - the prompt (during the prefill stage) decides which layers will be skipped during decoding. FiRST preserves compatibility with KV caching enabling faster inference while being quality-aware. FiRST is model-agnostic and can be easily enabled on any pre-trained LLM. Our approach reveals that input adaptivity is critical - indeed, different task-specific middle layers play a crucial role in evolving hidden representations depending on tasks. Extensive experiments show that FiRST significantly reduces latency while outperforming other layer selection strategies in quality metics. It retains competitive performance to base model (without layer skipping) and in some cases, even improves upon it. FiRST is thus a promising and efficient solution for LLM deployment in low-resource environments.
△ Less
Submitted 17 December, 2024; v1 submitted 16 October, 2024;
originally announced October 2024.
-
Large Language Models estimate fine-grained human color-concept associations
Authors:
Kushin Mukherjee,
Timothy T. Rogers,
Karen B. Schloss
Abstract:
Concepts, both abstract and concrete, elicit a distribution of association strengths across perceptual color space, which influence aspects of visual cognition ranging from object recognition to interpretation of information visualizations. While prior work has hypothesized that color-concept associations may be learned from the cross-modal statistical structure of experience, it has been unclear…
▽ More
Concepts, both abstract and concrete, elicit a distribution of association strengths across perceptual color space, which influence aspects of visual cognition ranging from object recognition to interpretation of information visualizations. While prior work has hypothesized that color-concept associations may be learned from the cross-modal statistical structure of experience, it has been unclear whether natural environments possess such structure or, if so, whether learning systems are capable of discovering and exploiting it without strong prior constraints. We addressed these questions by investigating the ability of GPT-4, a multimodal large language model, to estimate human-like color-concept associations without any additional training. Starting with human color-concept association ratings for 71 color set spanning perceptual color space (\texttt{UW-71}) and concepts that varied in abstractness, we assessed how well association ratings generated by GPT-4 could predict human ratings. GPT-4 ratings were correlated with human ratings, with performance comparable to state-of-the-art methods for automatically estimating color-concept associations from images. Variability in GPT-4's performance across concepts could be explained by specificity of the concept's color-concept association distribution. This study suggests that high-order covariances between language and perception, as expressed in the natural environment of the internet, contain sufficient information to support learning of human-like color-concept associations, and provides an existence proof that a learning system can encode such associations without initial constraints. The work further shows that GPT-4 can be used to efficiently estimate distributions of color associations for a broad range of concepts, potentially serving as a critical tool for designing effective and intuitive information visualizations.
△ Less
Submitted 4 May, 2024;
originally announced June 2024.
-
To Store or Not to Store: a graph theoretical approach for Dataset Versioning
Authors:
Anxin Guo,
Jingwei Li,
Pattara Sukprasert,
Samir Khuller,
Amol Deshpande,
Koyel Mukherjee
Abstract:
In this work, we study the cost efficient data versioning problem, where the goal is to optimize the storage and reconstruction (retrieval) costs of data versions, given a graph of datasets as nodes and edges capturing edit/delta information. One central variant we study is MinSum Retrieval (MSR) where the goal is to minimize the total retrieval costs, while keeping the storage costs bounded. This…
▽ More
In this work, we study the cost efficient data versioning problem, where the goal is to optimize the storage and reconstruction (retrieval) costs of data versions, given a graph of datasets as nodes and edges capturing edit/delta information. One central variant we study is MinSum Retrieval (MSR) where the goal is to minimize the total retrieval costs, while keeping the storage costs bounded. This problem (along with its variants) was introduced by Bhattacherjee et al. [VLDB'15]. While such problems are frequently encountered in collaborative tools (e.g., version control systems and data analysis pipelines), to the best of our knowledge, no existing research studies the theoretical aspects of these problems.
We establish that the currently best-known heuristic, LMG, can perform arbitrarily badly in a simple worst case. Moreover, we show that it is hard to get $o(n)$-approximation for MSR on general graphs even if we relax the storage constraints by an $O(\log n)$ factor. Similar hardness results are shown for other variants. Meanwhile, we propose poly-time approximation schemes for tree-like graphs, motivated by the fact that the graphs arising in practice from typical edit operations are often not arbitrary. As version graphs typically have low treewidth, we further develop new algorithms for bounded treewidth graphs.
Furthermore, we propose two new heuristics and evaluate them empirically. First, we extend LMG by considering more potential ``moves'', to propose a new heuristic LMG-All. LMG-All consistently outperforms LMG while having comparable run time on a wide variety of datasets, i.e., version graphs. Secondly, we apply our tree algorithms on the minimum-storage arborescence of an instance, yielding algorithms that are qualitatively better than all previous heuristics for MSR, as well as for another variant BoundedMin Retrieval (BMR).
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Towards Optimizing the Costs of LLM Usage
Authors:
Shivanshu Shekhar,
Tanishq Dubey,
Koyel Mukherjee,
Apoorv Saxena,
Atharv Tyagi,
Nishanth Kotla
Abstract:
Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases.
I…
▽ More
Generative AI and LLMs in particular are heavily used nowadays for various document processing tasks such as question answering and summarization. However, different LLMs come with different capabilities for different tasks as well as with different costs, tokenization, and latency. In fact, enterprises are already incurring huge costs of operating or using LLMs for their respective use cases.
In this work, we propose optimizing the usage costs of LLMs by estimating their output quality (without actually invoking the LLMs), and then solving an optimization routine for the LLM selection to either keep costs under a budget, or minimize the costs, in a quality and latency aware manner. We propose a model to predict the output quality of LLMs on document processing tasks like summarization, followed by an LP rounding algorithm to optimize the selection of LLMs. We study optimization problems trading off the quality and costs, both theoretically and empirically. We further propose a sentence simplification model for reducing the number of tokens in a controlled manner. Additionally, we propose several deterministic heuristics for reducing tokens in a quality aware manner, and study the related optimization problem of applying the heuristics optimizing the quality and cost trade-off. We perform extensive empirical validation of our methods on not only enterprise datasets but also on open-source datasets, annotated by us, and show that we perform much better compared to closest baselines. Our methods reduce costs by 40%- 90% while improving quality by 4%-7%. We will release the annotated open source datasets to the community for further research and exploration.
△ Less
Submitted 29 January, 2024;
originally announced February 2024.
-
R2D2: Reducing Redundancy and Duplication in Data Lakes
Authors:
Raunak Shah,
Koyel Mukherjee,
Atharv Tyagi,
Sai Keerthana Karnam,
Dhruv Joshi,
Shivam Bhosale,
Subrata Mitra
Abstract:
Enterprise data lakes often suffer from substantial amounts of duplicate and redundant data, with data volumes ranging from terabytes to petabytes. This leads to both increased storage costs and unnecessarily high maintenance costs for these datasets. In this work, we focus on identifying and reducing redundancy in enterprise data lakes by addressing the problem of 'dataset containment'. To the be…
▽ More
Enterprise data lakes often suffer from substantial amounts of duplicate and redundant data, with data volumes ranging from terabytes to petabytes. This leads to both increased storage costs and unnecessarily high maintenance costs for these datasets. In this work, we focus on identifying and reducing redundancy in enterprise data lakes by addressing the problem of 'dataset containment'. To the best of our knowledge, this is one of the first works that addresses table-level containment at a large scale.
We propose R2D2: a three-step hierarchical pipeline that efficiently identifies almost all instances of containment by progressively reducing the search space in the data lake. It first builds (i) a schema containment graph, followed by (ii) statistical min-max pruning, and finally, (iii) content level pruning. We further propose minimizing the total storage and access costs by optimally identifying redundant datasets that can be deleted (and reconstructed on demand) while respecting latency constraints.
We implement our system on Azure Databricks clusters using Apache Spark for enterprise data stored in ADLS Gen2, and on AWS clusters for open-source data. In contrast to existing modified baselines that are inaccurate or take several days to run, our pipeline can process an enterprise customer data lake at the TB scale in approximately 5 hours with high accuracy. We present theoretical results as well as extensive empirical validation on both enterprise (scale of TBs) and open-source datasets (scale of MBs - GBs), which showcase the effectiveness of our pipeline.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Approximate Caching for Efficiently Serving Diffusion Models
Authors:
Shubham Agarwal,
Subrata Mitra,
Sarthak Chakraborty,
Srikrishna Karanam,
Koyel Mukherjee,
Shiv Saini
Abstract:
Text-to-image generation using diffusion models has seen explosive popularity owing to their ability in producing high quality images adhering to text prompts. However, production-grade diffusion model serving is a resource intensive task that not only require high-end GPUs which are expensive but also incurs considerable latency. In this paper, we introduce a technique called approximate-caching…
▽ More
Text-to-image generation using diffusion models has seen explosive popularity owing to their ability in producing high quality images adhering to text prompts. However, production-grade diffusion model serving is a resource intensive task that not only require high-end GPUs which are expensive but also incurs considerable latency. In this paper, we introduce a technique called approximate-caching that can reduce such iterative denoising steps for an image generation based on a prompt by reusing intermediate noise states created during a prior image generation for similar prompts. Based on this idea, we present an end to end text-to-image system, Nirvana, that uses the approximate-caching with a novel cache management-policy Least Computationally Beneficial and Frequently Used (LCBFU) to provide % GPU compute savings, 19.8% end-to-end latency reduction and 19% dollar savings, on average, on two real production workloads. We further present an extensive characterization of real production text-to-image prompts from the perspective of caching, popularity and reuse of intermediate states in a large production environment.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
Authors:
Kushin Mukherjee,
Holly Huey,
Xuanchen Lu,
Yael Vinker,
Rio Aguina-Kang,
Ariel Shamir,
Judith E. Fan
Abstract:
Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision a…
▽ More
Sketching is a powerful tool for creating abstract images that are sparse but meaningful. Sketch understanding poses fundamental challenges for general-purpose vision algorithms because it requires robustness to the sparsity of sketches relative to natural visual inputs and because it demands tolerance for semantic ambiguity, as sketches can reliably evoke multiple meanings. While current vision algorithms have achieved high performance on a variety of visual tasks, it remains unclear to what extent they understand sketches in a human-like way. Here we introduce SEVA, a new benchmark dataset containing approximately 90K human-generated sketches of 128 object concepts produced under different time constraints, and thus systematically varying in sparsity. We evaluated a suite of state-of-the-art vision algorithms on their ability to correctly identify the target concept depicted in these sketches and to generate responses that are strongly aligned with human response patterns on the same sketch recognition task. We found that vision algorithms that better predicted human sketch recognition performance also better approximated human uncertainty about sketch meaning, but there remains a sizable gap between model and human response patterns. To explore the potential of models that emulate human visual abstraction in generative tasks, we conducted further evaluations of a recently developed sketch generation algorithm (Vinker et al., 2022) capable of generating sketches that vary in sparsity. We hope that public release of this dataset and evaluation protocol will catalyze progress towards algorithms with enhanced capacities for human-like visual abstraction.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Interpreting GNN-based IDS Detections Using Provenance Graph Structural Features
Authors:
Kunal Mukherjee,
Joshua Wiedemeier,
Tianhao Wang,
Muhyun Kim,
Feng Chen,
Murat Kantarcioglu,
Kangkook Jee
Abstract:
Advanced cyber threats (e.g., Fileless Malware and Advanced Persistent Threat (APT)) have driven the adoption of provenance-based security solutions. These solutions employ Machine Learning (ML) models for behavioral modeling and critical security tasks such as malware and anomaly detection. However, the opacity of ML-based security models limits their broader adoption, as the lack of transparency…
▽ More
Advanced cyber threats (e.g., Fileless Malware and Advanced Persistent Threat (APT)) have driven the adoption of provenance-based security solutions. These solutions employ Machine Learning (ML) models for behavioral modeling and critical security tasks such as malware and anomaly detection. However, the opacity of ML-based security models limits their broader adoption, as the lack of transparency in their decision-making processes restricts explainability and verifiability. We tailored our solution towards Graph Neural Network (GNN)-based security solutions since recent studies employ GNNs to comprehensively digest system provenance graphs for security critical tasks.
To enhance the explainability of GNN-based security models, we introduce PROVEXPLAINER, a framework offering instance-level security-aware explanations using an interpretable surrogate model. PROVEXPLAINER's interpretable feature space consists of discriminant subgraph patterns and graph structural features, which can be directly mapped to the system provenance problem space, making the explanations human understandable. By considering prominent GNN architectures (e.g., GAT and GraphSAGE) for anomaly detection tasks, we show how PROVEXPLAINER synergizes with current state-of-the-art (SOTA) GNN explainers to deliver domain and instance-specific explanations. We measure the explanation quality using the fidelity+/fidelity- metric as used by traditional GNN explanation literature, and we incorporate the precision/recall metric where we consider the accuracy of the explanation against the ground truth. On malware and APT datasets, PROVEXPLAINER achieves up to 29%/27%/25% higher fidelity+, precision and recall, and 12% lower fidelity- respectively, compared to SOTA GNN explainers.
△ Less
Submitted 16 December, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Towards Optimizing Storage Costs on the Cloud
Authors:
Koyel Mukherjee,
Raunak Shah,
Shiv Kumar Saini,
Karanpreet Singh,
Khushi,
Harsh Kesarwani,
Kavya Barnwal,
Ayush Chauhan
Abstract:
We study the problem of optimizing data storage and access costs on the cloud while ensuring that the desired performance or latency is unaffected. We first propose an optimizer that optimizes the data placement tier (on the cloud) and the choice of compression schemes to apply, for given data partitions with temporal access predictions. Secondly, we propose a model to learn the compression perfor…
▽ More
We study the problem of optimizing data storage and access costs on the cloud while ensuring that the desired performance or latency is unaffected. We first propose an optimizer that optimizes the data placement tier (on the cloud) and the choice of compression schemes to apply, for given data partitions with temporal access predictions. Secondly, we propose a model to learn the compression performance of multiple algorithms across data partitions in different formats to generate compression performance predictions on the fly, as inputs to the optimizer. Thirdly, we propose to approach the data partitioning problem fundamentally differently than the current default in most data lakes where partitioning is in the form of ingestion batches. We propose access pattern aware data partitioning and formulate an optimization problem that optimizes the size and reading costs of partitions subject to access patterns.
We study the various optimization problems theoretically as well as empirically, and provide theoretical bounds as well as hardness results. We propose a unified pipeline of cost minimization, called SCOPe that combines the different modules. We extensively compare the performance of our methods with related baselines from the literature on TPC-H data as well as enterprise datasets (ranging from GB to PB in volume) and show that SCOPe substantially improves over the baselines. We show significant cost savings compared to platform baselines, of the order of 50% to 83% on enterprise Data Lake datasets that range from terabytes to petabytes in volume.
△ Less
Submitted 6 July, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Semantic Feature Verification in FLAN-T5
Authors:
Siddharth Suresh,
Kushin Mukherjee,
Timothy T. Rogers
Abstract:
This study evaluates the potential of a large language model for aiding in generation of semantic feature norms - a critical tool for evaluating conceptual structure in cognitive science. Building from an existing human-generated dataset, we show that machine-verified norms capture aspects of conceptual structure beyond what is expressed in human norms alone, and better explain human judgments of…
▽ More
This study evaluates the potential of a large language model for aiding in generation of semantic feature norms - a critical tool for evaluating conceptual structure in cognitive science. Building from an existing human-generated dataset, we show that machine-verified norms capture aspects of conceptual structure beyond what is expressed in human norms alone, and better explain human judgments of semantic similarity amongst items that are distally related. The results suggest that LLMs can greatly enhance traditional methods of semantic feature norm verification, with implications for our understanding of conceptual representation in humans and machines.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Human-machine cooperation for semantic feature listing
Authors:
Kushin Mukherjee,
Siddharth Suresh,
Timothy T. Rogers
Abstract:
Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexica…
▽ More
Semantic feature norms, lists of features that concepts do and do not possess, have played a central role in characterizing human conceptual knowledge, but require extensive human labor. Large language models (LLMs) offer a novel avenue for the automatic generation of such feature lists, but are prone to significant error. Here, we present a new method for combining a learned model of human lexical-semantics from limited data with LLM-generated data to efficiently generate high-quality feature norms.
△ Less
Submitted 11 April, 2023;
originally announced April 2023.
-
Conceptual structure coheres in human cognition but not in large language models
Authors:
Siddharth Suresh,
Kushin Mukherjee,
Xizheng Yu,
Wei-Chun Huang,
Lisa Padua,
Timothy T Rogers
Abstract:
Neural network models of language have long been used as a tool for developing hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. Contemporary large language models (LLMs), however, make it possible to i…
▽ More
Neural network models of language have long been used as a tool for developing hypotheses about conceptual representation in the mind and brain. For many years, such use involved extracting vector-space representations of words and using distances among these to predict or understand human behavior in various semantic tasks. Contemporary large language models (LLMs), however, make it possible to interrogate the latent structure of conceptual representations using experimental methods nearly identical to those commonly used with human participants. The current work utilizes three common techniques borrowed from cognitive psychology to estimate and compare the structure of concepts in humans and a suite of LLMs. In humans, we show that conceptual structure is robust to differences in culture, language, and method of estimation. Structures estimated from LLM behavior, while individually fairly consistent with those estimated from human behavior, vary much more depending upon the particular task used to generate responses--across tasks, estimates of conceptual structure from the very same model cohere less with one another than do human structure estimates. These results highlight an important difference between contemporary LLMs and human cognition, with implications for understanding some fundamental limitations of contemporary machine language.
△ Less
Submitted 10 November, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
Natural Language Sentence Generation from API Specifications
Authors:
Siyu Huo,
Kushal Mukherjee,
Jayachandu Bandlamudi,
Vatche Isahagian,
Vinod Muthusamy,
Yara Rizk
Abstract:
APIs are everywhere; they provide access to automation solutions that could help businesses automate some of their tasks. Unfortunately, they may not be accessible to the business users who need them but are not equipped with the necessary technical skills to leverage them. Wrapping these APIs with chatbot capabilities is one solution to make these automation solutions interactive. In this work, w…
▽ More
APIs are everywhere; they provide access to automation solutions that could help businesses automate some of their tasks. Unfortunately, they may not be accessible to the business users who need them but are not equipped with the necessary technical skills to leverage them. Wrapping these APIs with chatbot capabilities is one solution to make these automation solutions interactive. In this work, we propose a system to generate sentences to train intent recognition models, a crucial component within chatbots to understand natural language utterances from users. Evaluation of our approach based on deep learning models showed promising and inspiring results, and the human-in-the-loop interaction will provide further improvement on the system.
△ Less
Submitted 1 June, 2022;
originally announced June 2022.
-
Context Matters: A Theory of Semantic Discriminability for Perceptual Encoding Systems
Authors:
Kushin Mukherjee,
Brian Yin,
Brianne E. Sherman,
Laurent Lessard,
Karen B. Schloss
Abstract:
People's associations between colors and concepts influence their ability to interpret the meanings of colors in information visualizations. Previous work has suggested such effects are limited to concepts that have strong, specific associations with colors. However, although a concept may not be strongly associated with any colors, its mapping can be disambiguated in the context of other concepts…
▽ More
People's associations between colors and concepts influence their ability to interpret the meanings of colors in information visualizations. Previous work has suggested such effects are limited to concepts that have strong, specific associations with colors. However, although a concept may not be strongly associated with any colors, its mapping can be disambiguated in the context of other concepts in an encoding system. We articulate this view in semantic discriminability theory, a general framework for understanding conditions determining when people can infer meaning from perceptual features. Semantic discriminability is the degree to which observers can infer a unique mapping between visual features and concepts. Semantic discriminability theory posits that the capacity for semantic discriminability for a set of concepts is constrained by the difference between the feature-concept association distributions across the concepts in the set. We define formal properties of this theory and test its implications in two experiments. The results show that the capacity to produce semantically discriminable colors for sets of concepts was indeed constrained by the statistical distance between color-concept association distributions (Experiment 1). Moreover, people could interpret meanings of colors in bar graphs insofar as the colors were semantically discriminable, even for concepts previously considered "non-colorable" (Experiment 2). The results suggest that colors are more robust for visual communication than previously thought.
△ Less
Submitted 21 September, 2023; v1 submitted 8 August, 2021;
originally announced August 2021.
-
A Simple Dynamic Learning Rate Tuning Algorithm For Automated Training of DNNs
Authors:
Koyel Mukherjee,
Alind Khare,
Ashish Verma
Abstract:
Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models…
▽ More
Training neural networks on image datasets generally require extensive experimentation to find the optimal learning rate regime. Especially, for the cases of adversarial training or for training a newly synthesized model, one would not know the best learning rate regime beforehand. We propose an automated algorithm for determining the learning rate trajectory, that works across datasets and models for both natural and adversarial training, without requiring any dataset/model specific tuning. It is a stand-alone, parameterless, adaptive approach with no computational overhead. We theoretically discuss the algorithm's convergence behavior. We empirically validate our algorithm extensively. Our results show that our proposed approach \emph{consistently} achieves top-level accuracy compared to SOTA baselines in the literature in natural as well as adversarial training.
△ Less
Submitted 25 October, 2019;
originally announced October 2019.
-
Layer Dynamics of Linearised Neural Nets
Authors:
Saurav Basu,
Koyel Mukherjee,
Shrihari Vasudevan
Abstract:
Despite the phenomenal success of deep learning in recent years, there remains a gap in understanding the fundamental mechanics of neural nets. More research is focussed on handcrafting complex and larger networks, and the design decisions are often ad-hoc and based on intuition. Some recent research has aimed to demystify the learning dynamics in neural nets by attempting to build a theory from f…
▽ More
Despite the phenomenal success of deep learning in recent years, there remains a gap in understanding the fundamental mechanics of neural nets. More research is focussed on handcrafting complex and larger networks, and the design decisions are often ad-hoc and based on intuition. Some recent research has aimed to demystify the learning dynamics in neural nets by attempting to build a theory from first principles, such as characterising the non-linear dynamics of specialised \textit{linear} deep neural nets (such as orthogonal networks). In this work, we expand and derive properties of learning dynamics respected by general multi-layer linear neural nets. Although an over-parameterisation of a single layer linear network, linear multi-layer neural nets offer interesting insights that explain how learning dynamics proceed in small pockets of the data space. We show in particular that multiple layers in linear nets grow at approximately the same rate, and there are distinct phases of learning with markedly different layer growth. We then apply a linearisation process to a general RelU neural net and show how nonlinearity breaks down the growth symmetry observed in liner neural nets. Overall, our work can be viewed as an initial step in building a theory for understanding the effect of layer design on the learning dynamics from first principles.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
Cogniculture: Towards a Better Human-Machine Co-evolution
Authors:
Rakesh R Pimplikar,
Kushal Mukherjee,
Gyana Parija,
Harit Vishwakarma,
Ramasuri Narayanam,
Sarthak Ahuja,
Rohith D Vallam,
Ritwik Chaudhuri,
Joydeep Mondal
Abstract:
Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opi…
▽ More
Research in Artificial Intelligence is breaking technology barriers every day. New algorithms and high performance computing are making things possible which we could only have imagined earlier. Though the enhancements in AI are making life easier for human beings day by day, there is constant fear that AI based systems will pose a threat to humanity. People in AI community have diverse set of opinions regarding the pros and cons of AI mimicking human behavior. Instead of worrying about AI advancements, we propose a novel idea of cognitive agents, including both human and machines, living together in a complex adaptive ecosystem, collaborating on human computation for producing essential social goods while promoting sustenance, survival and evolution of the agents' life cycle. We highlight several research challenges and technology barriers in achieving this goal. We propose a governance mechanism around this ecosystem to ensure ethical behaviors of all cognitive agents. Along with a novel set of use-cases of Cogniculture, we discuss the road map ahead for this journey.
△ Less
Submitted 11 December, 2017;
originally announced December 2017.
-
SolarisNet: A Deep Regression Network for Solar Radiation Prediction
Authors:
Subhadip Dey,
Sawon Pratiher,
Saon Banerjee,
Chanchal Kumar Mukherjee
Abstract:
Effective utilization of photovoltaic (PV) plants requires weather variability robust global solar radiation (GSR) forecasting models. Random weather turbulence phenomena coupled with assumptions of clear sky model as suggested by Hottel pose significant challenges to parametric & non-parametric models in GSR conversion rate estimation. Also, a decent GSR estimate requires costly high-tech radiome…
▽ More
Effective utilization of photovoltaic (PV) plants requires weather variability robust global solar radiation (GSR) forecasting models. Random weather turbulence phenomena coupled with assumptions of clear sky model as suggested by Hottel pose significant challenges to parametric & non-parametric models in GSR conversion rate estimation. Also, a decent GSR estimate requires costly high-tech radiometer and expert dependent instrument handling and measurements, which are subjective. As such, a computer aided monitoring (CAM) system to evaluate PV plant operation feasibility by employing smart grid past data analytics and deep learning is developed. Our algorithm, SolarisNet is a 6-layer deep neural network trained on data collected at two weather stations located near Kalyani metrological site, West Bengal, India. The daily GSR prediction performance using SolarisNet outperforms the existing state of art and its efficacy in inferring past GSR data insights to comprehend daily and seasonal GSR variability along with its competence for short term forecasting is discussed.
△ Less
Submitted 10 December, 2017; v1 submitted 22 November, 2017;
originally announced November 2017.
-
Impact of Detour-Aware Policies on Maximizing Profit in Ridesharing
Authors:
Arpita Biswas,
Ragavendran Gopalakrishnan,
Theja Tulabandhula,
Asmita Metrewar,
Koyel Mukherjee,
Raja Subramaniam Thangaraj
Abstract:
This paper provides efficient solutions to maximize profit for commercial ridesharing services, under a pricing model with detour-based discounts for passengers. We propose greedy heuristics for real-time ride matching that offer different trade-offs between optimality and speed. Simulations on New York City (NYC) taxi trip data show that our heuristics are up to 90% optimal and 10^5 times faster…
▽ More
This paper provides efficient solutions to maximize profit for commercial ridesharing services, under a pricing model with detour-based discounts for passengers. We propose greedy heuristics for real-time ride matching that offer different trade-offs between optimality and speed. Simulations on New York City (NYC) taxi trip data show that our heuristics are up to 90% optimal and 10^5 times faster than the (necessarily) exponential-time optimal algorithm.
Commercial ridesharing service providers generate significant savings by matching multiple ride requests using heuristic methods. The resulting savings are typically shared between the service provider (in the form of increased profit) and the ridesharing passengers (in the form of discounts). It is not clear a priori how this split should be effected, since higher discounts would encourage more ridesharing, thereby increasing total savings, but the fraction of savings taken as profit is reduced. We simulate a scenario where the decisions of the passengers to opt for ridesharing depend on the discount offered by the service provider. We provide an adaptive learning algorithm IDFLA that learns the optimal profit-maximizing discount factor for the provider. An evaluation over NYC data shows that IDFLA, on average, learns the optimal discount factor in under 16 iterations.
Finally, we investigate the impact of imposing a detour-aware routing policy based on sequential individual rationality, a recently proposed concept. Such restricted policies offer a better ride experience, increasing the provider's market share, but at the cost of decreased average per-ride profit due to the reduced number of matched rides. We construct a model that captures these opposing effects, wherein simulations based on NYC data show that a 7% increase in market share would suffice to offset the decreased average per-ride profit.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Learning to Partition using Score Based Compatibilities
Authors:
Arun Rajkumar,
Koyel Mukherjee,
Theja Tulabandhula
Abstract:
We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal groupings. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal groupings. When we do not impose any structure on the compatibilities, we show that the group fo…
▽ More
We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal groupings. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal groupings. When we do not impose any structure on the compatibilities, we show that the group formation objectives considered are $NP$ hard to solve and we either give approximation guarantees or prove inapproximability results. We then introduce an elegant structure, namely that of \textit{intrinsic scores}, that makes many of these problems polynomial time solvable. We explicitly characterize the optimal groupings under this structure and show that the optimal solutions are related to \emph{homophilous} and \emph{heterophilous} partitions, well-studied in the psychology literature. For one of the four objectives, we show $NP$ hardness under the score structure and give a $\frac{1}{2}$ approximation algorithm for which no constant approximation was known thus far. Finally, under the score structure, we propose an online low sample complexity PAC algorithm for learning the optimal partition. We demonstrate the efficacy of the proposed algorithm on synthetic and real world datasets.
△ Less
Submitted 22 March, 2017;
originally announced March 2017.
-
LP Rounding and Combinatorial Algorithms for Minimizing Active and Busy Time
Authors:
Jessica Chang,
Samir Khuller,
Koyel Mukherjee
Abstract:
We consider fundamental scheduling problems motivated by energy issues. In this framework, we are given a set of jobs, each with a release time, deadline and required processing length. The jobs need to be scheduled on a machine so that at most g jobs are active at any given time. The duration for which a machine is active (i.e., "on") is referred to as its active time. The goal is to find a feasi…
▽ More
We consider fundamental scheduling problems motivated by energy issues. In this framework, we are given a set of jobs, each with a release time, deadline and required processing length. The jobs need to be scheduled on a machine so that at most g jobs are active at any given time. The duration for which a machine is active (i.e., "on") is referred to as its active time. The goal is to find a feasible schedule for all jobs, minimizing the total active time. When preemption is allowed at integer time points, we show that a minimal feasible schedule already yields a 3-approximation (and this bound is tight) and we further improve this to a 2-approximation via LP rounding techniques. Our second contribution is for the non-preemptive version of this problem. However, since even asking if a feasible schedule on one machine exists is NP-hard, we allow for an unbounded number of virtual machines, each having capacity of g. This problem is known as the busy time problem in the literature and a 4-approximation is known for this problem. We develop a new combinatorial algorithm that gives a 3-approximation. Furthermore, we consider the preemptive busy time problem, giving a simple and exact greedy algorithm when unbounded parallelism is allowed, i.e., g is unbounded. For arbitrary g, this yields an algorithm that is 2-approximate.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.
-
The Costs and Benefits of Sharing: Sequential Individual Rationality and Sequential Fairness
Authors:
Ragavendran Gopalakrishnan,
Koyel Mukherjee,
Theja Tulabandhula
Abstract:
In designing dynamic shared service systems that incentivize customers to opt for shared rather than exclusive service, the traditional notion of individual rationality may be insufficient, as a customer's estimated utility could fluctuate arbitrarily during their time in the shared system, as long as their realized utility at service completion is not worse than that for exclusive service. In thi…
▽ More
In designing dynamic shared service systems that incentivize customers to opt for shared rather than exclusive service, the traditional notion of individual rationality may be insufficient, as a customer's estimated utility could fluctuate arbitrarily during their time in the shared system, as long as their realized utility at service completion is not worse than that for exclusive service. In this work, within a model that explicitly considers the "inconvenience costs" incurred by customers due to sharing, we introduce the notion of sequential individual rationality (SIR) that requires that the disutility of existing customers is nonincreasing as the system state changes due to new customer arrivals. Next, under SIR, we observe that cost sharing can also be viewed as benefit sharing, which inspires a natural definition of sequential fairness (SF) - the total incremental benefit due to a new customer is shared among existing customers in proportion to the incremental inconvenience suffered.
We demonstrate the effectiveness of these notions by applying them to a ridesharing system, where unexpected detours to pick up subsequent passengers inconvenience the existing passengers. Imposing SIR and SF reveals interesting and surprising results, including: (a) natural limits on the incremental detours permissible, (b) exact characterization of "SIR-feasible" routes, which boast sublinear upper and lower bounds on the fractional detours, (c) exact characterization of sequentially fair cost sharing schemes, which includes a strong requirement that passengers must compensate each other for the detour inconveniences that they cause, and (d) new algorithmic problems related to and motivated by SIR.
△ Less
Submitted 20 June, 2017; v1 submitted 25 July, 2016;
originally announced July 2016.
-
Online Tracking of Skin Colour Regions Against a Complex Background
Authors:
Subhadip Basu,
S. Chakraborty,
K. Mukherjee,
S. K. Pandit
Abstract:
Online tracking of human activity against a complex background is a challenging task for many applications. In this paper, we have developed a robust technique for localizing skin colour regions from unconstrained image frames. A simple and fast segmentation algorithm is used to train a multiplayer perceptron (MLP) for detection of skin colours. Stepper motors are synchronized with the MLP to trac…
▽ More
Online tracking of human activity against a complex background is a challenging task for many applications. In this paper, we have developed a robust technique for localizing skin colour regions from unconstrained image frames. A simple and fast segmentation algorithm is used to train a multiplayer perceptron (MLP) for detection of skin colours. Stepper motors are synchronized with the MLP to track the movement of the skin colour regions.
△ Less
Submitted 15 October, 2014;
originally announced October 2014.
-
Online interpretation of numeric sign language using 2-d skeletal model
Authors:
Subhadip Basu,
S. Dey,
K. Mukherjee,
T. S. Jana
Abstract:
Gesturing is one of the natural modes of human communication. Signs produced by gestures can have a basic meaning coupled with additional information that is layered over the basic meaning of the sign. Sign language is an important example of communicative gestures that are highly structured and well accepted across the world as a communication medium for deaf and dumb. In this paper, an online re…
▽ More
Gesturing is one of the natural modes of human communication. Signs produced by gestures can have a basic meaning coupled with additional information that is layered over the basic meaning of the sign. Sign language is an important example of communicative gestures that are highly structured and well accepted across the world as a communication medium for deaf and dumb. In this paper, an online recognition scheme is proposed to interpret the standard numeric sign language comprising of 10 basic hand symbols. A web camera is used to capture the real time hand movements as input to the system. The basic meaning of the hand gesture is extracted from the input data frame by analysing the shape of the hand, considering its orientation, movement and location to be fixed. The input hand shape is processed to identify the palm structure, fingertips and their relative positions and the presence of the extended thumb. A 2-dimensional skeletal model is generated from the acquired shape information to represent and subsequently interpret the basic meaning of the hand gesture.
△ Less
Submitted 15 October, 2014;
originally announced October 2014.
-
A Game-Theoretic Model Motivated by the DARPA Network Challenge
Authors:
Rajesh Chitnis,
MohammadTaghi Hajiaghayi,
Jonathan Katz,
Koyel Mukherjee
Abstract:
In this paper we propose a game-theoretic model to analyze events similar to the 2009 \emph{DARPA Network Challenge}, which was organized by the Defense Advanced Research Projects Agency (DARPA) for exploring the roles that the Internet and social networks play in incentivizing wide-area collaborations. The challenge was to form a group that would be the first to find the locations of ten moored w…
▽ More
In this paper we propose a game-theoretic model to analyze events similar to the 2009 \emph{DARPA Network Challenge}, which was organized by the Defense Advanced Research Projects Agency (DARPA) for exploring the roles that the Internet and social networks play in incentivizing wide-area collaborations. The challenge was to form a group that would be the first to find the locations of ten moored weather balloons across the United States. We consider a model in which $N$ people (who can form groups) are located in some topology with a fixed coverage volume around each person's geographical location. We consider various topologies where the players can be located such as the Euclidean $d$-dimension space and the vertices of a graph. A balloon is placed in the space and a group wins if it is the first one to report the location of the balloon. A larger team has a higher probability of finding the balloon, but we assume that the prize money is divided equally among the team members. Hence there is a competing tension to keep teams as small as possible.
\emph{Risk aversion} is the reluctance of a person to accept a bargain with an uncertain payoff rather than another bargain with a more certain, but possibly lower, expected payoff. In our model we consider the \emph{isoelastic} utility function derived from the Arrow-Pratt measure of relative risk aversion. The main aim is to analyze the structures of the groups in Nash equilibria for our model. For the $d$-dimensional Euclidean space ($d\geq 1$) and the class of bounded degree regular graphs we show that in any Nash Equilibrium the \emph{richest} group (having maximum expected utility per person) covers a constant fraction of the total volume.
△ Less
Submitted 30 January, 2013; v1 submitted 30 April, 2012;
originally announced April 2012.