Search | arXiv e-print repository

doi 10.1145/3701716.3715242

Knowledge Distillation for Enhancing Walmart E-commerce Search Relevance Using Large Language Models

Authors: Hongwei Shang, Nguyen Vo, Nitin Yadav, Tian Zhang, Ajit Puthenputhussery, Xunfan Cai, Shuyi Chen, Prijith Chandran, Changsung Kang

Abstract: Ensuring the products displayed in e-commerce search results are relevant to users queries is crucial for improving the user experience. With their advanced semantic understanding, deep learning models have been widely used for relevance matching in search tasks. While large language models (LLMs) offer superior ranking capabilities, it is challenging to deploy LLMs in real-time systems due to the… ▽ More Ensuring the products displayed in e-commerce search results are relevant to users queries is crucial for improving the user experience. With their advanced semantic understanding, deep learning models have been widely used for relevance matching in search tasks. While large language models (LLMs) offer superior ranking capabilities, it is challenging to deploy LLMs in real-time systems due to the high-latency requirements. To leverage the ranking power of LLMs while meeting the low-latency demands of production systems, we propose a novel framework that distills a high performing LLM into a more efficient, low-latency student model. To help the student model learn more effectively from the teacher model, we first train the teacher LLM as a classification model with soft targets. Then, we train the student model to capture the relevance margin between pairs of products for a given query using mean squared error loss. Instead of using the same training data as the teacher model, we significantly expand the student model dataset by generating unlabeled data and labeling it with the teacher model predictions. Experimental results show that the student model performance continues to improve as the size of the augmented training data increases. In fact, with enough augmented data, the student model can outperform the teacher model. The student model has been successfully deployed in production at Walmart.com with significantly positive metrics. △ Less

Submitted 11 May, 2025; originally announced May 2025.

Comments: 9 pages, published at WWWW'25

Journal ref: The Web Conference 2025

arXiv:2504.19955 [pdf, ps, other]

Robust Federated Personalised Mean Estimation for the Gaussian Mixture Model

Authors: Malhar A. Managoli, Vinod M. Prabhakaran, Suhas Diggavi

Abstract: Federated learning with heterogeneous data and personalization has received significant recent attention. Separately, robustness to corrupted data in the context of federated learning has also been studied. In this paper we explore combining personalization for heterogeneous data with robustness, where a constant fraction of the clients are corrupted. Motivated by this broad problem, we formulate… ▽ More Federated learning with heterogeneous data and personalization has received significant recent attention. Separately, robustness to corrupted data in the context of federated learning has also been studied. In this paper we explore combining personalization for heterogeneous data with robustness, where a constant fraction of the clients are corrupted. Motivated by this broad problem, we formulate a simple instantiation which captures some of its difficulty. We focus on the specific problem of personalized mean estimation where the data is drawn from a Gaussian mixture model. We give an algorithm whose error depends almost linearly on the ratio of corrupted to uncorrupted samples, and show a lower bound with the same behavior, albeit with a gap of a constant factor. △ Less

Submitted 28 April, 2025; originally announced April 2025.

arXiv:2503.15228 [pdf, other]

Sensing-Based Beamformed Resource Allocation in Standalone Millimeter-Wave Vehicular Networks

Authors: Alessandro Traspadini, Anay Ajit Deshpande, Marco Giordani, Chinmay Mahabal, Takayuki Shimizu, Michele Zorzi

Abstract: In 3GPP New Radio (NR) Vehicle-to-Everything (V2X), the new standard for next-generation vehicular networks, vehicles can autonomously select sidelink resources for data transmission, which permits network operations without cellular coverage. However, standalone resource allocation is uncoordinated, and is complicated by the high mobility of the nodes that may introduce unforeseen channel collisi… ▽ More In 3GPP New Radio (NR) Vehicle-to-Everything (V2X), the new standard for next-generation vehicular networks, vehicles can autonomously select sidelink resources for data transmission, which permits network operations without cellular coverage. However, standalone resource allocation is uncoordinated, and is complicated by the high mobility of the nodes that may introduce unforeseen channel collisions (e.g., when a transmitting vehicle changes path) or free up resources (e.g., when a vehicle moves outside of the communication area). Moreover, unscheduled resource allocation is prone to the hidden node and exposed node problems, which are particularly critical considering directional transmissions. In this paper, we implement and demonstrate a new channel access scheme for NR V2X in Frequency Range 2 (FR2), i.e., at millimeter wave (mmWave) frequencies, based on directional and beamformed transmissions along with Sidelink Control Information (SCI) to select resources for transmission. We prove via simulation that this approach can reduce the probability of collision for resource allocation, compared to a baseline solution that does not configure SCI transmissions. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: 7 pages, 8 figures, 3 tables. Accepted for publication in the 2025 IEEE International Conference on Communications (ICC). \c{opyright} 2025 IEEE. A. Traspadini, A. A. Deshpande, M. Giordani, C. Mahabal, T. Shimizu, and M. Zorzi, "Sensing-Based Beamformed Resource Allocation in Standalone Millimeter-Wave Vehicular Networks," in Proc. IEEE International Conference on Communications (ICC), 2025

arXiv:2502.19825 [pdf, other]

Fast Debiasing of the LASSO Estimator

Authors: Shuvayan Banerjee, James Saunderson, Radhendushka Srivastava, Ajit Rajwade

Abstract: In high-dimensional sparse regression, the \textsc{Lasso} estimator offers excellent theoretical guarantees but is well-known to produce biased estimates. To address this, \cite{Javanmard2014} introduced a method to ``debias" the \textsc{Lasso} estimates for a random sub-Gaussian sensing matrix $\boldsymbol{A}$. Their approach relies on computing an ``approximate inverse" $\boldsymbol{M}$ of the m… ▽ More In high-dimensional sparse regression, the \textsc{Lasso} estimator offers excellent theoretical guarantees but is well-known to produce biased estimates. To address this, \cite{Javanmard2014} introduced a method to ``debias" the \textsc{Lasso} estimates for a random sub-Gaussian sensing matrix $\boldsymbol{A}$. Their approach relies on computing an ``approximate inverse" $\boldsymbol{M}$ of the matrix $\boldsymbol{A}^\top \boldsymbol{A}/n$ by solving a convex optimization problem. This matrix $\boldsymbol{M}$ plays a critical role in mitigating bias and allowing for construction of confidence intervals using the debiased \textsc{Lasso} estimates. However the computation of $\boldsymbol{M}$ is expensive in practice as it requires iterative optimization. In the presented work, we re-parameterize the optimization problem to compute a ``debiasing matrix" $\boldsymbol{W} := \boldsymbol{AM}^{\top}$ directly, rather than the approximate inverse $\boldsymbol{M}$. This reformulation retains the theoretical guarantees of the debiased \textsc{Lasso} estimates, as they depend on the \emph{product} $\boldsymbol{AM}^{\top}$ rather than on $\boldsymbol{M}$ alone. Notably, we provide a simple, computationally efficient, closed-form solution for $\boldsymbol{W}$ under similar conditions for the sensing matrix $\boldsymbol{A}$ used in the original debiasing formulation, with an additional condition that the elements of every row of $\boldsymbol{A}$ have uncorrelated entries. Also, the optimization problem based on $\boldsymbol{W}$ guarantees a unique optimal solution, unlike the original formulation based on $\boldsymbol{M}$. We verify our main result with numerical simulations. △ Less

Submitted 27 February, 2025; originally announced February 2025.

arXiv:2501.18229 [pdf, other]

GPD: Guided Polynomial Diffusion for Motion Planning

Authors: Ajit Srikanth, Parth Mahanjan, Kallol Saha, Vishal Mandadi, Pranjal Paul, Pawan Wadhwani, Brojeshwar Bhowmick, Arun Singh, Madhava Krishna

Abstract: Diffusion-based motion planners are becoming popular due to their well-established performance improvements, stemming from sample diversity and the ease of incorporating new constraints directly during inference. However, a primary limitation of the diffusion process is the requirement for a substantial number of denoising steps, especially when the denoising process is coupled with gradient-based… ▽ More Diffusion-based motion planners are becoming popular due to their well-established performance improvements, stemming from sample diversity and the ease of incorporating new constraints directly during inference. However, a primary limitation of the diffusion process is the requirement for a substantial number of denoising steps, especially when the denoising process is coupled with gradient-based guidance. In this paper, we introduce, diffusion in the parametric space of trajectories, where the parameters are represented as Bernstein coefficients. We show that this representation greatly improves the effectiveness of the cost function guidance and the inference speed. We also introduce a novel stitching algorithm that leverages the diversity in diffusion-generated trajectories to produce collision-free trajectories with just a single cost function-guided model. We demonstrate that our approaches outperform current SOTA diffusion-based motion planners for manipulators and provide an ablation study on key components. △ Less

Submitted 30 January, 2025; originally announced January 2025.

arXiv:2501.15724 [pdf, other]

A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks

Authors: Dong Li, Guihong Wan, Xintao Wu, Xinyu Wu, Ajit J. Nirmal, Christine G. Lian, Peter K. Sorger, Yevgeniy R. Semenov, Chen Zhao

Abstract: Computational pathology foundation models (CPathFMs) have emerged as a powerful approach for analyzing histopathological data, leveraging self-supervised learning to extract robust feature representations from unlabeled whole-slide images. These models, categorized into uni-modal and multi-modal frameworks, have demonstrated promise in automating complex pathology tasks such as segmentation, class… ▽ More Computational pathology foundation models (CPathFMs) have emerged as a powerful approach for analyzing histopathological data, leveraging self-supervised learning to extract robust feature representations from unlabeled whole-slide images. These models, categorized into uni-modal and multi-modal frameworks, have demonstrated promise in automating complex pathology tasks such as segmentation, classification, and biomarker discovery. However, the development of CPathFMs presents significant challenges, such as limited data accessibility, high variability across datasets, the necessity for domain-specific adaptation, and the lack of standardized evaluation benchmarks. This survey provides a comprehensive review of CPathFMs in computational pathology, focusing on datasets, adaptation strategies, and evaluation tasks. We analyze key techniques, such as contrastive learning and multi-modal integration, and highlight existing gaps in current research. Finally, we explore future directions from four perspectives for advancing CPathFMs. This survey serves as a valuable resource for researchers, clinicians, and AI practitioners, guiding the advancement of CPathFMs toward robust and clinically applicable AI-driven pathology solutions. △ Less

Submitted 25 February, 2025; v1 submitted 26 January, 2025; originally announced January 2025.

arXiv:2501.12938 [pdf, ps, other]

Robust Hypothesis Testing with Abstention

Authors: Malhar A. Managoli, K. R. Sahasranand, Vinod M. Prabhakaran

Abstract: We study the binary hypothesis testing problem where an adversary may potentially corrupt a fraction of the samples. The detector is, however, permitted to abstain from making a decision if (and only if) the adversary is present. We consider a few natural "contamination models" and characterize for them the trade-off between the error exponents of the four types of errors -- errors of deciding in… ▽ More We study the binary hypothesis testing problem where an adversary may potentially corrupt a fraction of the samples. The detector is, however, permitted to abstain from making a decision if (and only if) the adversary is present. We consider a few natural "contamination models" and characterize for them the trade-off between the error exponents of the four types of errors -- errors of deciding in favour of the incorrect hypothesis when the adversary is present and errors of abstaining or deciding in favour of the wrong hypothesis when the adversary is absent, under the two hypotheses. △ Less

Submitted 23 January, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

arXiv:2501.02872 [pdf, other]

Two-Dimensional Unknown View Tomography from Unknown Angle Distributions

Authors: Kaishva Chintan Shah, Karthik S. Gurumoorthy, Ajit Rajwade

Abstract: This study presents a technique for 2D tomography under unknown viewing angles when the distribution of the viewing angles is also unknown. Unknown view tomography (UVT) is a problem encountered in cryo-electron microscopy and in the geometric calibration of CT systems. There exists a moderate-sized literature on the 2D UVT problem, but most existing 2D UVT algorithms assume knowledge of the angle… ▽ More This study presents a technique for 2D tomography under unknown viewing angles when the distribution of the viewing angles is also unknown. Unknown view tomography (UVT) is a problem encountered in cryo-electron microscopy and in the geometric calibration of CT systems. There exists a moderate-sized literature on the 2D UVT problem, but most existing 2D UVT algorithms assume knowledge of the angle distribution which is not available usually. Our proposed methodology formulates the problem as an optimization task based on cross-validation error, to estimate the angle distribution jointly with the underlying 2D structure in an alternating fashion. We explore the algorithm's capabilities for the case of two probability distribution models: a semi-parametric mixture of von Mises densities and a probability mass function model. We evaluate our algorithm's performance under noisy projections using a PCA-based denoising technique and Graph Laplacian Tomography (GLT) driven by order statistics of the estimated distribution, to ensure near-perfect ordering, and compare our algorithm to intuitive baselines. △ Less

Submitted 6 January, 2025; originally announced January 2025.

Comments: Accepted to the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2025

arXiv:2412.04637 [pdf]

Semantic Retrieval at Walmart

Authors: Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee, Ciya Liao

Abstract: In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web search, especially for tail queries, which have a complex and specific search intent. In this paper, we present a hybrid system for e-commerce search deployed at Walmart that combines traditional inverted index and embedding-based neural retrieval to better answer us… ▽ More In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web search, especially for tail queries, which have a complex and specific search intent. In this paper, we present a hybrid system for e-commerce search deployed at Walmart that combines traditional inverted index and embedding-based neural retrieval to better answer user tail queries. Our system significantly improved the relevance of the search engine, measured by both offline and online evaluations. The improvements were achieved through a combination of different approaches. We present a new technique to train the neural model at scale. and describe how the system was deployed in production with little impact on response time. We highlight multiple learnings and practical tricks that were used in the deployment of this system. △ Less

Submitted 5 December, 2024; originally announced December 2024.

Comments: 9 page, 2 figures, 10 tables, KDD 2022

arXiv:2409.19548 [pdf, other]

doi 10.1145/3698876

Meta Learning to Rank for Sparsely Supervised Queries

Authors: Xuyang Wu, Ajit Puthenputhussery, Hongwei Shang, Changsung Kang, Yi Fang

Abstract: Supervisory signals are a critical resource for training learning to rank models. In many real-world search and retrieval scenarios, these signals may not be readily available or could be costly to obtain for some queries. The examples include domains where labeling requires professional expertise, applications with strong privacy constraints, and user engagement information that are too scarce. W… ▽ More Supervisory signals are a critical resource for training learning to rank models. In many real-world search and retrieval scenarios, these signals may not be readily available or could be costly to obtain for some queries. The examples include domains where labeling requires professional expertise, applications with strong privacy constraints, and user engagement information that are too scarce. We refer to these scenarios as sparsely supervised queries which pose significant challenges to traditional learning to rank models. In this work, we address sparsely supervised queries by proposing a novel meta learning to rank framework which leverages fast learning and adaption capability of meta-learning. The proposed approach accounts for the fact that different queries have different optimal parameters for their rankers, in contrast to traditional learning to rank models which only learn a global ranking model applied to all the queries. In consequence, the proposed method would yield significant advantages especially when new queries are of different characteristics with the training queries. Moreover, the proposed meta learning to rank framework is generic and flexible. We conduct a set of comprehensive experiments on both public datasets and a real-world e-commerce dataset. The results demonstrate that the proposed meta-learning approach can significantly enhance the performance of learning to rank models with sparsely labeled queries. △ Less

Submitted 29 September, 2024; originally announced September 2024.

Comments: Accepted at TOIS

arXiv:2409.05345 [pdf, other]

Robust Non-adaptive Group Testing under Errors in Group Membership Specifications

Authors: Shuvayan Banerjee, Radhendushka Srivastava, James Saunderson, Ajit Rajwade

Abstract: Given $p$ samples, each of which may or may not be defective, group testing (GT) aims to determine their defect status by performing tests on $n < p$ `groups', where a group is formed by mixing a subset of the $p$ samples. Assuming that the number of defective samples is very small compared to $p$, GT algorithms have provided excellent recovery of the status of all $p$ samples with even a small nu… ▽ More Given $p$ samples, each of which may or may not be defective, group testing (GT) aims to determine their defect status by performing tests on $n < p$ `groups', where a group is formed by mixing a subset of the $p$ samples. Assuming that the number of defective samples is very small compared to $p$, GT algorithms have provided excellent recovery of the status of all $p$ samples with even a small number of groups. Most existing methods, however, assume that the group memberships are accurately specified. This assumption may not always be true in all applications, due to various resource constraints. Such errors could occur, eg, when a technician, preparing the groups in a laboratory, unknowingly mixes together an incorrect subset of samples as compared to what was specified. We develop a new GT method, the Debiased Robust Lasso Test Method (DRLT), that handles such group membership specification errors. The proposed DRLT method is based on an approach to debias, or reduce the inherent bias in, estimates produced by Lasso, a popular and effective sparse regression technique. We also provide theoretical upper bounds on the reconstruction error produced by our estimator. Our approach is then combined with two carefully designed hypothesis tests respectively for (i) the identification of defective samples in the presence of errors in group membership specifications, and (ii) the identification of groups with erroneous membership specifications. The DRLT approach extends the literature on bias mitigation of statistical estimators such as the LASSO, to handle the important case when some of the measurements contain outliers, due to factors such as group membership specification errors. We present numerical results which show that our approach outperforms several baselines and robust regression techniques for identification of defective samples as well as erroneously specified groups. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2408.14791 [pdf]

doi 10.18280/jesa.570528

Optimizing Structured Data Processing through Robotic Process Automation

Authors: Vivek Bhardwaj, Ajit Noonia, Sandeep Chaurasia, Mukesh Kumar, Abdulnaser Rashid, Mohamed Tahar Ben Othman

Abstract: Robotic Process Automation (RPA) has emerged as a game-changing technology in data extraction, revolutionizing the way organizations process and analyze large volumes of documents such as invoices, purchase orders, and payment advices. This study investigates the use of RPA for structured data extraction and evaluates its advantages over manual processes. By comparing human-performed tasks with th… ▽ More Robotic Process Automation (RPA) has emerged as a game-changing technology in data extraction, revolutionizing the way organizations process and analyze large volumes of documents such as invoices, purchase orders, and payment advices. This study investigates the use of RPA for structured data extraction and evaluates its advantages over manual processes. By comparing human-performed tasks with those executed by RPA software bots, we assess efficiency and accuracy in data extraction from invoices, focusing on the effectiveness of the RPA system. Through four distinct scenarios involving varying numbers of invoices, we measure efficiency in terms of time and effort required for task completion, as well as accuracy by comparing error rates between manual and RPA processes. Our findings highlight the significant efficiency gains achieved by RPA, with bots completing tasks in significantly less time compared to manual efforts across all cases. Moreover, the RPA system consistently achieves perfect accuracy, mitigating the risk of errors and enhancing process reliability. These results underscore the transformative potential of RPA in optimizing operational efficiency, reducing human labor costs, and improving overall business performance. △ Less

Submitted 31 October, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

Journal ref: Journal Européen des Systèmes Automatisés, Vol. 57, No. 5, pp. 1523-1530 (2024)

arXiv:2407.01198 [pdf, ps, other]

Cycles of weight divisible by $k$

Authors: Ajit A. Diwan

Abstract: A weighted (directed) graph is a (directed) graph with integer weights assigned to its vertices and edges. The weight of a subgraph is the sum of weights of vertices and edges in the subgraph. The problem of determining the largest order $f(k)$ of a weighted complete directed graph that does not contain a directed cycle of weight divisible by $k$, for an integer $k \ge 2$, was raised by Alon and K… ▽ More A weighted (directed) graph is a (directed) graph with integer weights assigned to its vertices and edges. The weight of a subgraph is the sum of weights of vertices and edges in the subgraph. The problem of determining the largest order $f(k)$ of a weighted complete directed graph that does not contain a directed cycle of weight divisible by $k$, for an integer $k \ge 2$, was raised by Alon and Krivelevich [J. Graph Theory 98 (2021) 623-629]. They showed that $f(k)$ is $O(k\log k)$ and $f(k) \le 2k-2$ if $k$ is prime. The best bounds known to us are $f(k) \le 2k-2$ for all $k$ and $f(k) < (3k-1)/2$ for prime $k$. It is also known that $f(k) \ge k$ and this is believed to be the correct value. We prove that $f(k) < k+2Ω(k)$, where $Ω(k)$ is the number of prime factors, not necessarily distinct, in the prime factorization of $k$. We also show that any weighted undirected graph of minimum degree $2k-1$ contains a cycle of weight divisible by $k$. This result is proved in the more general setting in which the weights are from a finite abelian group of order $k$, and the cycle has weight equal to the group identity. We conjecture that this holds for undirected graphs with minimum degree $k+1$. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: The article that proves the optimal bound for odd k (arXiv:2406.19855) appeared after this had been submitted

MSC Class: 05C35 05C22 05C38

arXiv:2406.17542 [pdf, ps, other]

CDQuant: Greedy Coordinate Descent for Accurate LLM Quantization

Authors: Pranav Ajit Nair, Arun Sai Suggala

Abstract: Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-t… ▽ More Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses greedy coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to models with hundreds of billions of parameters. We perform extensive evaluation on Gemma, and PaLM2 model families, and demonstrate that CDQuant consistently outperforms GPTQ in 2-4 bit weight quantization. Moreover, CDQuant improves the performance of state-of-the-art PTQ techniques such as QuIP and FrameQuant when used as a replacement for their GPTQ component, resulting in further gains in quality. △ Less

Submitted 22 October, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.00247 [pdf, other]

Large Language Models for Relevance Judgment in Product Search

Authors: Navid Mehrdad, Hrushikesh Mohapatra, Mossaab Bagdouri, Prijith Chandran, Alessandro Magnani, Xunfan Cai, Ajit Puthenputhussery, Sachin Yadav, Tony Lee, ChengXiang Zhai, Ciya Liao

Abstract: High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for… ▽ More High relevance of retrieved and re-ranked items to the search query is the cornerstone of successful product search, yet measuring relevance of items to queries is one of the most challenging tasks in product information retrieval, and quality of product search is highly influenced by the precision and scale of available relevance-labelled data. In this paper, we present an array of techniques for leveraging Large Language Models (LLMs) for automating the relevance judgment of query-item pairs (QIPs) at scale. Using a unique dataset of multi-million QIPs, annotated by human evaluators, we test and optimize hyper parameters for finetuning billion-parameter LLMs with and without Low Rank Adaption (LoRA), as well as various modes of item attribute concatenation and prompting in LLM finetuning, and consider trade offs in item attribute inclusion for quality of relevance predictions. We demonstrate considerable improvement over baselines of prior generations of LLMs, as well as off-the-shelf models, towards relevance annotations on par with the human relevance evaluators. Our findings have immediate implications for the growing field of relevance judgment automation in product search. △ Less

Submitted 16 July, 2024; v1 submitted 31 May, 2024; originally announced June 2024.

Comments: 10 pages, 1 figure, 11 tables - SIGIR 2024, LLM4Eval

ACM Class: H.3.3; I.2.7

arXiv:2405.21004 [pdf, other]

doi 10.1145/3675095.3676619

MunchSonic: Tracking Fine-grained Dietary Actions through Active Acoustic Sensing on Eyeglasses

Authors: Saif Mahmud, Devansh Agarwal, Ashwin Ajit, Qikang Liang, Thalia Viranda, Francois Guimbretiere, Cheng Zhang

Abstract: We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses to track fine-grained dietary actions. MunchSonic emits inaudible ultrasonic waves from the eyeglass frame, with the reflected signals capturing detailed positions and movements of body parts, including the mouth, jaw, arms, and hands involved in eating. These signals are processed by a deep learning p… ▽ More We introduce MunchSonic, an AI-powered active acoustic sensing system integrated into eyeglasses to track fine-grained dietary actions. MunchSonic emits inaudible ultrasonic waves from the eyeglass frame, with the reflected signals capturing detailed positions and movements of body parts, including the mouth, jaw, arms, and hands involved in eating. These signals are processed by a deep learning pipeline to classify six actions: hand-to-mouth movements for food intake, chewing, drinking, talking, face-hand touching, and other activities (null). In an unconstrained study with 12 participants, MunchSonic achieved a 93.5% macro F1-score in a user-independent evaluation with a 2-second resolution in tracking these actions, also demonstrating its effectiveness in tracking eating episodes and food intake frequency within those episodes. △ Less

Submitted 2 August, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: 8 pages, 7 figures

arXiv:2405.16654 [pdf, other]

doi 10.1145/3643834.3660714

Ethics Pathways: A Design Activity for Reflecting on Ethics Engagement in HCI Research

Authors: Inha Cha, Ajit G. Pillai, Richmond Y. Wong

Abstract: This paper introduces Ethics Pathways, a design activity aimed at understanding HCI and design researchers' ethics engagements and flows during their research process. Despite a strong ethical commitment in these fields, challenges persist in grasping the complexity of researchers' engagement with ethics -- practices conducted to operationalize ethics -- in situated institutional contexts. Ethics… ▽ More This paper introduces Ethics Pathways, a design activity aimed at understanding HCI and design researchers' ethics engagements and flows during their research process. Despite a strong ethical commitment in these fields, challenges persist in grasping the complexity of researchers' engagement with ethics -- practices conducted to operationalize ethics -- in situated institutional contexts. Ethics Pathways, developed through six playtesting sessions, offers a design approach to understanding the complexities of researchers' past ethics engagements in their work. This activity involves four main tasks: recalling ethical incidents; describing stakeholders involved in the situation; recounting their actions or speculative alternatives; and reflection and emotion walk-through. The paper reflects on the role of design decisions and facilitation strategies in achieving these goals. The design activity contributes to the discourse on ethical HCI research by conceptualizing ethics engagement as a part of ongoing research processing, highlighting connections between individual affective experiences, social interactions across power differences, and institutional goals. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Accepted at ACM Designing Interactive Systems (DIS) 2024

arXiv:2405.15416 [pdf, ps, other]

doi 10.46298/dmtcs.13929

Planar cycle-extendable graphs

Authors: Aditya Y Dalwadi, Kapil R Shenvi Pause, Ajit A Diwan, Nishad Kothari

Abstract: For most problems pertaining to perfect matchings, one may restrict attention to matching covered graphs - that is, connected nontrivial graphs with the property that each edge belongs to some perfect matching. There is extensive literature on these graphs that are also known as 1-extendable graphs (since each edge extends to a perfect matching) including an ear decomposition theorem due to Lovász… ▽ More For most problems pertaining to perfect matchings, one may restrict attention to matching covered graphs - that is, connected nontrivial graphs with the property that each edge belongs to some perfect matching. There is extensive literature on these graphs that are also known as 1-extendable graphs (since each edge extends to a perfect matching) including an ear decomposition theorem due to Lovász and Plummer. A cycle $C$ of a graph $G$ is conformal if $G-V(C)$ has a perfect matching; such cycles play an important role in the study of perfect matchings, especially when investigating the Pfaffian orientation problem. A matching covered graph $G$ is cycle-extendable if - for each even cycle $C$ - the cycle $C$ is conformal, or equivalently, each perfect matching of $C$ extends to a perfect matching of $G$, or equivalently, $C$ is the symmetric difference of two perfect matchings of $G$, or equivalently, $C$ extends to an ear decomposition of $G$. In the literature, these are also known as cycle-nice or as 1-cycle resonant graphs. Zhang, Wang, Yuan, Ng and Cheng, 2022, provided a characterization of claw-free cycle-extendable graphs. Guo and Zhang, 2004, and independently Zhang and Li, 2012, provided characterizations of bipartite planar cycle-extendable graphs. In this paper, we establish a characterization of all planar cycle-extendable graphs - in terms of $K_2$ and four infinite families. △ Less

Submitted 12 May, 2025; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: The last author Nishad Kothari would like to acknowledge Rajat Adak (currently a PhD student at IISc) for many discussions on cycle-extendability (while he was a BSc student at CMI)

Journal ref: Discrete Mathematics & Theoretical Computer Science, vol. 27:2, Graph Theory (May 13, 2025) dmtcs:13929

arXiv:2405.05211 [pdf, ps, other]

Broadcast Channel Synthesis from Shared Randomness

Authors: Malhar A. Managoli, Vinod M. Prabhakaran

Abstract: We study the problem of synthesising a two-user broadcast channel using a common message, where each output terminal shares an independent source of randomness with the input terminal. This generalises two problems studied in the literature (Cuff, IEEE Trans. Inform. Theory, 2013; Kurri et.al., IEEE Trans. Inform. Theory, 2021). We give an inner bound on the tradeoff region between the rates of co… ▽ More We study the problem of synthesising a two-user broadcast channel using a common message, where each output terminal shares an independent source of randomness with the input terminal. This generalises two problems studied in the literature (Cuff, IEEE Trans. Inform. Theory, 2013; Kurri et.al., IEEE Trans. Inform. Theory, 2021). We give an inner bound on the tradeoff region between the rates of communication and shared randomness, and a lower bound on the minimum communication rate. Although the bounds presented here are not tight in general, they are tight for some special cases, including the aforementioned problems. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.02585 [pdf, ps, other]

Maximal Guesswork Leakage

Authors: Gowtham R. Kurri, Malhar Managoli, Vinod M. Prabhakaran

Abstract: We introduce the study of information leakage through \emph{guesswork}, the minimum expected number of guesses required to guess a random variable. In particular, we define \emph{maximal guesswork leakage} as the multiplicative decrease, upon observing $Y$, of the guesswork of a randomized function of $X$, maximized over all such randomized functions. We also study a pointwise form of the leakage… ▽ More We introduce the study of information leakage through \emph{guesswork}, the minimum expected number of guesses required to guess a random variable. In particular, we define \emph{maximal guesswork leakage} as the multiplicative decrease, upon observing $Y$, of the guesswork of a randomized function of $X$, maximized over all such randomized functions. We also study a pointwise form of the leakage which captures the leakage due to the release of a single realization of $Y$. We also study these two notions of leakage with oblivious (or memoryless) guessing. We obtain closed-form expressions for all these leakage measures, with the exception of one. Specifically, we are able to obtain closed-form expression for maximal guesswork leakage for the binary erasure source only; deriving expressions for arbitrary sources appears challenging. Some of the consequences of our results are -- a connection between guesswork and differential privacy and a new operational interpretation to maximal $α$-leakage in terms of guesswork. △ Less

Submitted 4 May, 2024; originally announced May 2024.

Comments: 6 pages. Extended version of a paper accepted to ISIT 2024

arXiv:2404.17390 [pdf, other]

How Could AI Support Design Education? A Study Across Fields Fuels Situating Analytics

Authors: Ajit Jain, Andruid Kerne, Hannah Fowler, Jinsil Seo, Galen Newman, Nic Lupfer, Aaron Perrine

Abstract: We use the process and findings from a case study of design educators' practices of assessment and feedback to fuel theorizing about how to make AI useful in service of human experience. We build on Suchman's theory of situated actions. We perform a qualitative study of 11 educators in 5 fields, who teach design processes situated in project-based learning contexts. Through qualitative data gather… ▽ More We use the process and findings from a case study of design educators' practices of assessment and feedback to fuel theorizing about how to make AI useful in service of human experience. We build on Suchman's theory of situated actions. We perform a qualitative study of 11 educators in 5 fields, who teach design processes situated in project-based learning contexts. Through qualitative data gathering and analysis, we derive codes: design process; assessment and feedback challenges; and computational support. We twice invoke creative cognition's family resemblance principle. First, to explain how design instructors already use assessment rubrics and second, to explain the analogous role for design creativity analytics: no particular trait is necessary or sufficient; each only tends to indicate good design work. Human teachers remain essential. We develop a set of situated design creativity analytics--Fluency, Flexibility, Visual Consistency, Multiscale Organization, and Legible Contrast--to support instructors' efforts, by providing on-demand, learning objectives-based assessment and feedback to students. We theorize a methodology, which we call situating analytics, firstly because making AI support living human activity depends on aligning what analytics measure with situated practices. Further, we realize that analytics can become most significant to users by situating them through interfaces that integrate them into the material contexts of their use. Here, this means situating design creativity analytics into actual design environments. Through the case study, we identify situating analytics as a methodology for explaining analytics to users, because the iterative process of alignment with practice has the potential to enable data scientists to derive analytics that make sense as part of and support situated human experiences. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 31 pages, 3 figures, Submitted to ACM

ACM Class: H.5.2

arXiv:2404.13933 [pdf]

Comparison of On-Orbit Manual Attitude Control Methods for Non-Docking Spacecraft Through Virtual Reality Simulation

Authors: Ajit Krishnan, Himanshu Vishwakarma, Maharudra Kharsade, Pradipta Biswas

Abstract: On-orbit manual attitude control of manned spacecraft is accomplished using external visual references and some method of three axis attitude control. All past, present, and developmental spacecraft feature the capability to manually control attitude for deorbit. National Aeronautics and Space Administration (NASA) spacecraft permit an aircraft windshield type front view, wherein an arc of the Ear… ▽ More On-orbit manual attitude control of manned spacecraft is accomplished using external visual references and some method of three axis attitude control. All past, present, and developmental spacecraft feature the capability to manually control attitude for deorbit. National Aeronautics and Space Administration (NASA) spacecraft permit an aircraft windshield type front view, wherein an arc of the Earths horizon is visible to the crew in deorbit attitude. Russian and Chinese spacecraft permit the crew a bottom view wherein the entire circular Earth horizon disk is visible to the crew in deorbit attitude. Our study compared these two types of external views for efficiency in achievement of deorbit attitude. We used a Unity Virtual Reality (VR) spacecraft simulator that we built in house. The task was to accurately achieve deorbit attitude while in a 400 km circular orbit. Six military test pilots and six civilians with gaming experience flew the task using two methods of visual reference. Comparison was based on time taken, fuel consumed, cognitive workload assessment and user preference. We used ocular parameters, EEG, NASA TLX and IBM SUS to quantify our results. Our study found that the bottom view was easier to operate for manual deorbit task. Additionally, we realized that a VR based system can work as a training simulator for manual on-orbit flight path control tasks by pilots and non pilots. Results from our study can be used for design of manual on orbit attitude control of present and future spacecrafts. △ Less

Submitted 22 April, 2024; originally announced April 2024.

ACM Class: H.5.2

arXiv:2404.13924 [pdf, other]

doi 10.1145/3699752

ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the Body

Authors: Saif Mahmud, Vineet Parikh, Qikang Liang, Ke Li, Ruidong Zhang, Ashwin Ajit, Vipin Gunda, Devansh Agarwal, François Guimbretière, Cheng Zhang

Abstract: We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura ar… ▽ More We present ActSonic, an intelligent, low-power active acoustic sensing system integrated into eyeglasses that can recognize 27 different everyday activities (e.g., eating, drinking, toothbrushing) from inaudible acoustic waves around the body. It requires only a pair of miniature speakers and microphones mounted on each hinge of the eyeglasses to emit ultrasonic waves, creating an acoustic aura around the body. The acoustic signals are reflected based on the position and motion of various body parts, captured by the microphones, and analyzed by a customized self-supervised deep learning framework to infer the performed activities on a remote device such as a mobile phone or cloud server. ActSonic was evaluated in user studies with 19 participants across 19 households to track its efficacy in everyday activity recognition. Without requiring any training data from new users (leave-one-participant-out evaluation), ActSonic detected 27 activities, achieving an average F1-score of 86.6% in fully unconstrained scenarios and 93.4% in prompted settings at participants' homes. △ Less

Submitted 25 November, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 8, Issue 4, November 2024, IMWUT/UbiComp 2025

arXiv:2404.06445 [pdf, ps, other]

Extremal minimal bipartite matching covered graphs

Authors: Amit Kumar Mallik, Ajit A. Diwan, Nishad Kothari

Abstract: A connected graph, on four or more vertices, is matching covered if every edge is present in some perfect matching. An ear decomposition theorem (similar to the one for $2$-connected graphs) exists for bipartite matching covered graphs due to Hetyei. From the results and proofs of Lovász and Plummer, that rely on Hetyei's theorem, one may deduce that any minimal bipartite matching covered graph ha… ▽ More A connected graph, on four or more vertices, is matching covered if every edge is present in some perfect matching. An ear decomposition theorem (similar to the one for $2$-connected graphs) exists for bipartite matching covered graphs due to Hetyei. From the results and proofs of Lovász and Plummer, that rely on Hetyei's theorem, one may deduce that any minimal bipartite matching covered graph has at least $2(m-n+2)$ vertices of degree two (where minimal means that deleting any edge results in a graph that is not matching covered); such a graph is said to be extremal if it attains the stated lower bound. In this paper, we provide a complete characterization of the class of extremal minimal bipartite matching covered graphs. In particular, we prove that every such graph $G$ is obtained from two copies of a tree devoid of degree two vertices, say $T$ and $T'$, by adding edges -- each of which joins a leaf of $T$ with the corresponding leaf of $T'$. Apart from the aforementioned bound, there are four other bounds that appear in, or may be deduced from, the work of Lovász and Plummer. Each of these bounds leads to a notion of extremality. In this paper, we obtain a complete characterization of all of these extremal classes and also establish relationships between them. Two of our characterizations are in the same spirit as the one stated above. For the remaining two extremal classes, we reduce each of them to one of the already characterized extremal classes using standard matching theoretic operations. △ Less

Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: Submitted to Innovations in Graph Theory

arXiv:2404.05417 [pdf, other]

Indexing Analytics to Instances: How Integrating a Dashboard can Support Design Education

Authors: Ajit Jain, Andruid Kerne, Nic Lupfer, Gabriel Britain, Aaron Perrine, Yoonsuck Choe, John Keyser, Ruihong Huang, Jinsil Seo, Annie Sungkajun, Robert Lightfoot, Timothy McGuire

Abstract: We investigate how to use AI-based analytics to support design education. The analytics at hand measure multiscale design, that is, students' use of space and scale to visually and conceptually organize their design work. With the goal of making the analytics intelligible to instructors, we developed a research artifact integrating a design analytics dashboard with design instances, and the design… ▽ More We investigate how to use AI-based analytics to support design education. The analytics at hand measure multiscale design, that is, students' use of space and scale to visually and conceptually organize their design work. With the goal of making the analytics intelligible to instructors, we developed a research artifact integrating a design analytics dashboard with design instances, and the design environment that students use to create them. We theorize about how Suchman's notion of mutual intelligibility requires contextualized investigation of AI in order to develop findings about how analytics work for people. We studied the research artifact in 5 situated course contexts, in 3 departments. A total of 236 students used the multiscale design environment. The 9 instructors who taught those students experienced the analytics via the new research artifact. We derive findings from a qualitative analysis of interviews with instructors regarding their experiences. Instructors reflected on how the analytics and their presentation in the dashboard have the potential to affect design education. We develop research implications addressing: (1) how indexing design analytics in the dashboard to actual design work instances helps design instructors reflect on what they mean and, more broadly, is a technique for how AI-based design analytics can support instructors' assessment and feedback experiences in situated course contexts; and (2) how multiscale design analytics, in particular, have the potential to support design education. By indexing, we mean linking which provides context, here connecting the numbers of the analytics with visually annotated design work instances. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 22 pages, 4 figures, Submitted to ACM DIS

ACM Class: H.5.2

arXiv:2403.12120 [pdf, other]

doi 10.1016/j.ascom.2024.100850

Light Curve Classification with DistClassiPy: a new distance-based classifier

Authors: Siddharth Chaini, Ashish Mahabal, Ajit Kembhavi, Federica B. Bianco

Abstract: The rise of synoptic sky surveys has ushered in an era of big data in time-domain astronomy, making data science and machine learning essential tools for studying celestial objects. While tree-based models (e.g. Random Forests) and deep learning models dominate the field, we explore the use of different distance metrics to aid in the classification of astrophysical objects. We developed DistClassi… ▽ More The rise of synoptic sky surveys has ushered in an era of big data in time-domain astronomy, making data science and machine learning essential tools for studying celestial objects. While tree-based models (e.g. Random Forests) and deep learning models dominate the field, we explore the use of different distance metrics to aid in the classification of astrophysical objects. We developed DistClassiPy, a new distance metric based classifier. The direct use of distance metrics is unexplored in time-domain astronomy, but distance-based methods can help make classification more interpretable and decrease computational costs. In particular, we applied DistClassiPy to classify light curves of variable stars, comparing the distances between objects of different classes. Using 18 distance metrics on a catalog of 6,000 variable stars across 10 classes, we demonstrate classification and dimensionality reduction. Our classifier meets state-of-the-art performance but has lower computational requirements and improved interpretability. Additionally, DistClassiPy can be tailored to specific objects by identifying the most effective distance metric for that classification. To facilitate broader applications within and beyond astronomy, we have made DistClassiPy open-source and available at https://pypi.org/project/distclassipy/. △ Less

Submitted 25 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted for publication in Astronomy and Computing (2024). 24 pages, 19 figures

arXiv:2402.08644 [pdf, other]

Tandem Transformers for Inference Efficient LLMs

Authors: Aishwarya P S, Pranav Ajit Nair, Yashas Samaga, Toby Boyd, Sanjiv Kumar, Prateek Jain, Praneeth Netrapalli

Abstract: The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitations: either relying on less accurate smaller models for generation or failing to fully leverage the base LLM's representations. We introduce a novel architectu… ▽ More The autoregressive nature of conventional large language models (LLMs) inherently limits inference speed, as tokens are generated sequentially. While speculative and parallel decoding techniques attempt to mitigate this, they face limitations: either relying on less accurate smaller models for generation or failing to fully leverage the base LLM's representations. We introduce a novel architecture, Tandem transformers, to address these issues. This architecture uniquely combines (1) a small autoregressive model and (2) a large model operating in block mode (processing multiple tokens simultaneously). The small model's predictive accuracy is substantially enhanced by granting it attention to the large model's richer representations. On the PaLM2 pretraining dataset, a tandem of PaLM2-Bison and PaLM2-Gecko demonstrates a 3.3% improvement in next-token prediction accuracy over a standalone PaLM2-Gecko, offering a 1.16x speedup compared to a PaLM2-Otter model with comparable downstream performance. We further incorporate the tandem model within the speculative decoding (SPEED) framework where the large model validates tokens from the small model. This ensures that the Tandem of PaLM2-Bison and PaLM2-Gecko achieves substantial speedup (around 1.14x faster than using vanilla PaLM2-Gecko in SPEED) while maintaining identical downstream task accuracy. △ Less

Submitted 20 October, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07637 [pdf, other]

Compressive Recovery of Signals Defined on Perturbed Graphs

Authors: Sabyasachi Ghosh, Ajit Rajwade

Abstract: Recovery of signals with elements defined on the nodes of a graph, from compressive measurements is an important problem, which can arise in various domains such as sensor networks, image reconstruction and group testing. In some scenarios, the graph may not be accurately known, and there may exist a few edge additions or deletions relative to a ground truth graph. Such perturbations, even if smal… ▽ More Recovery of signals with elements defined on the nodes of a graph, from compressive measurements is an important problem, which can arise in various domains such as sensor networks, image reconstruction and group testing. In some scenarios, the graph may not be accurately known, and there may exist a few edge additions or deletions relative to a ground truth graph. Such perturbations, even if small in number, significantly affect the Graph Fourier Transform (GFT). This impedes recovery of signals which may have sparse representations in the GFT bases of the ground truth graph. We present an algorithm which simultaneously recovers the signal from the compressive measurements and also corrects the graph perturbations. We analyze some important theoretical properties of the algorithm. Our approach to correction for graph perturbations is based on model selection techniques such as cross-validation in compressed sensing. We validate our algorithm on signals which have a sparse representation in the GFT bases of many commonly used graphs in the network science literature. An application to compressive image reconstruction is also presented, where graph perturbations are modeled as undesirable graph edges linking pixels with significant intensity difference. In all experiments, our algorithm clearly outperforms baseline techniques which either ignore the perturbations or use first order approximations to the perturbations in the GFT bases. △ Less

Submitted 16 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 18 pages, 15 figures. v2: Minor correction in ref [32]

arXiv:2312.04294 [pdf, ps, other]

Energy-Efficient Internet of Things Monitoring with Content-Based Wake-Up Radio

Authors: Anay Ajit Deshpande, Federico Chiariotti, Andrea Zanella

Abstract: The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled. However, polling-based WUR may still lead to wasted energy if values sensed by the polled sensors provide no new information to the… ▽ More The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled. However, polling-based WUR may still lead to wasted energy if values sensed by the polled sensors provide no new information to the receiver, or in general have a low Value of Information (VoI). In this paper, we design a content-based WUR that tracks the process observed by the sensors and only wakes up the sensor if its estimated update's VoI is higher than a threshold communicated through the poll. If the sensor does not reply to the polling request, the Gateway (GW) can make a Bayesian update, knowing that either the sensor value substantially confirms its current estimate or the transmission failed due to the wireless channel. We analyze the trade-off between the tracking error and the battery lifetime of the sensors, showing that content-based WUR can provide fine-grained control of this trade-off and significantly increase the battery lifetime of the node with a minimal Mean Squared Error (MSE) increase. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.01532 [pdf, other]

Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments

Authors: Shanqing Cai, Subhashini Venugopalan, Katie Seaver, Xiang Xiao, Katrin Tomanek, Sri Jalasutram, Meredith Ringel Morris, Shaun Kane, Ajit Narayanan, Robert L. MacDonald, Emily Kornman, Daniel Vance, Blair Casey, Steve M. Gleason, Philip Q. Nelson, Michael P. Brenner

Abstract: Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking keyboards is important for improving the quality of life for such individuals. Recent advances in neural networks of natural language pose new opportunities for re-thi… ▽ More Finding ways to accelerate text input for individuals with profound motor impairments has been a long-standing area of research. Closing the speed gap for augmentative and alternative communication (AAC) devices such as eye-tracking keyboards is important for improving the quality of life for such individuals. Recent advances in neural networks of natural language pose new opportunities for re-thinking strategies and user interfaces for enhanced text-entry for AAC users. In this paper, we present SpeakFaster, consisting of large language models (LLMs) and a co-designed user interface for text entry in a highly-abbreviated form, allowing saving 57% more motor actions than traditional predictive keyboards in offline simulation. A pilot study with 19 non-AAC participants typing on a mobile device by hand demonstrated gains in motor savings in line with the offline simulation, while introducing relatively small effects on overall typing speed. Lab and field testing on two eye-gaze typing users with amyotrophic lateral sclerosis (ALS) demonstrated text-entry rates 29-60% faster than traditional baselines, due to significant saving of expensive keystrokes achieved through phrase and word predictions from context-aware LLMs. These findings provide a strong foundation for further exploration of substantially-accelerated text communication for motor-impaired users and demonstrate a direction for applying LLMs to text-based user interfaces. △ Less

Submitted 3 December, 2023; originally announced December 2023.

arXiv:2311.13821 [pdf, other]

HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted Corrections for Reliable Regression on Imbalanced Electrocardiograms

Authors: Uddeshya Upadhyay, Sairam Bade, Arjun Puranik, Shahir Asfahan, Melwin Babu, Francisco Lopez-Jimenez, Samuel J. Asirvatham, Ashim Prasad, Ajit Rajasekharan, Samir Awasthi, Rakesh Barve

Abstract: The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals ef… ▽ More The automated analysis of medical time series, such as the electrocardiogram (ECG), electroencephalogram (EEG), pulse oximetry, etc, has the potential to serve as a valuable tool for diagnostic decisions, allowing for remote monitoring of patients and more efficient use of expensive and time-consuming medical procedures. Deep neural networks (DNNs) have been demonstrated to process such signals effectively. However, previous research has primarily focused on classifying medical time series rather than attempting to regress the continuous-valued physiological parameters central to diagnosis. One significant challenge in this regard is the imbalanced nature of the dataset, as a low prevalence of abnormal conditions can lead to heavily skewed data that results in inaccurate predictions and a lack of certainty in such predictions when deployed. To address these challenges, we propose HypUC, a framework for imbalanced probabilistic regression in medical time series, making several contributions. (i) We introduce a simple kernel density-based technique to tackle the imbalanced regression problem with medical time series. (ii) Moreover, we employ a probabilistic regression framework that allows uncertainty estimation for the predicted continuous values. (iii) We also present a new approach to calibrate the predicted uncertainty further. (iv) Finally, we demonstrate a technique to use calibrated uncertainty estimates to improve the predicted continuous value and show the efficacy of the calibrated uncertainty estimates to flag unreliable predictions. HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients, outperforming several conventional baselines on various diagnostic tasks, suggesting a potential use-case for the reliable clinical deployment of deep learning models. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Published at TMLR

Journal ref: Transactions on Machine Learning Research (TMLR), 2023

arXiv:2311.02573 [pdf, other]

Group Testing for Accurate and Efficient Range-Based Near Neighbor Search for Plagiarism Detection

Authors: Harsh Shah, Kashish Mittal, Ajit Rajwade

Abstract: This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem. Our method efficiently marks each item in a database as neighbor or non-neighbor of a query point, based on a cosine distance threshold without exhaustive search. Like other methods for large scale retrieval, our approach exploits the assumption that most of the items in the da… ▽ More This work presents an adaptive group testing framework for the range-based high dimensional near neighbor search problem. Our method efficiently marks each item in a database as neighbor or non-neighbor of a query point, based on a cosine distance threshold without exhaustive search. Like other methods for large scale retrieval, our approach exploits the assumption that most of the items in the database are unrelated to the query. However, it does not assume a large difference between the cosine similarity of the query vector with the least related neighbor and that with the least unrelated non-neighbor. Following a multi-stage adaptive group testing algorithm based on binary splitting, we divide the set of items to be searched into half at each step, and perform dot product tests on smaller and smaller subsets, many of which we are able to prune away. We show that, using softmax-based features, our method achieves a more than ten-fold speed-up over exhaustive search with no loss of accuracy, on a variety of large datasets. Based on empirically verified models for the distribution of cosine distances, we present a theoretical analysis of the expected number of distance computations per query and the probability that a pool will be pruned. Our method has the following features: (i) It implicitly exploits useful distributional properties of cosine distances unlike other methods; (ii) All required data structures are created purely offline; (iii) It does not impose any strong assumptions on the number of true near neighbors; (iv) It is adaptable to streaming settings where new vectors are dynamically added to the database; and (v) It does not require any parameter tuning. The high recall of our technique makes it particularly suited to plagiarism detection scenarios where it is important to report every database item that is sufficiently similar item to the query. △ Less

Submitted 6 September, 2024; v1 submitted 5 November, 2023; originally announced November 2023.

Comments: 28 pages (including Supplementary Material)

arXiv:2310.15233 [pdf, other]

doi 10.1103/PhysRevD.110.084035

New approach to template banks of gravitational waves with higher harmonics: Reducing matched-filtering cost by over an order of magnitude

Authors: Digvijay Wadekar, Tejaswi Venumadhav, Ajit Kumar Mehta, Javier Roulet, Seth Olsen, Jonathan Mushkin, Barak Zackay, Matias Zaldarriaga

Abstract: Searches for gravitational wave events use models, or templates, for the signals of interest. The templates used in current searches in the LIGO-Virgo-Kagra (LVK) data model the dominant quadrupole mode $(\ell,|m|)=(2,2)$ of the signals, and omit sub-dominant higher-order modes (HM) such as $(\ell,|m|)=(3,3)$, $(4,4)$, which are predicted by general relativity. This omission reduces search sensiti… ▽ More Searches for gravitational wave events use models, or templates, for the signals of interest. The templates used in current searches in the LIGO-Virgo-Kagra (LVK) data model the dominant quadrupole mode $(\ell,|m|)=(2,2)$ of the signals, and omit sub-dominant higher-order modes (HM) such as $(\ell,|m|)=(3,3)$, $(4,4)$, which are predicted by general relativity. This omission reduces search sensitivity to black hole mergers in interesting parts of parameter space, such as systems with high masses and asymmetric mass-ratios. We develop a new strategy to include HM in template banks: instead of making templates containing a combination of different modes, we separately store normalized templates corresponding to $(2,2)$, $(3,3)$ and $(4,4)$ modes. To model aligned-spin $(3,3)$, $(4,4)$ waveforms corresponding to a given $(2,2)$ waveform, we use a combination of post-Newtonian formulae and machine learning tools. In the matched filtering stage, one can filter each mode separately with the data and collect the timeseries of signal-to-noise ratios (SNR). This leads to a HM template bank whose matched-filtering cost is just $\approx 3\times$ that of a quadrupole-only search (as opposed to $\approx\! 100 \times$ in previously proposed HM search methods). Our method is effectual and generally applicable for template banks constructed with either stochastic or geometric placement techniques. New GW candidate events that we detect using our HM banks and details for combining the different SNR mode timeseries are presented in accompanying papers: Wadekar et al. [1] and [2] respectively. Additionally, we discuss non-linear compression of $(2,2)$-only geometric-placement template banks using machine learning algorithms. △ Less

Submitted 16 October, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Comments: 12+2 pages, 8+1 figures. The code for generating our template banks and reproducing the plots in our paper is publicly available at https://github.com/JayWadekar/gwIAS-HM

Journal ref: Phys. Rev. D 110, 084035 (2024)

arXiv:2309.11414 [pdf, other]

doi 10.1109/ICRA57147.2024.10610519

EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning

Authors: Kallol Saha, Vishal Mandadi, Jayaram Reddy, Ajit Srikanth, Aditya Agarwal, Bipasha Sen, Arun Singh, Madhava Krishna

Abstract: Classical motion planning for robotic manipulation includes a set of general algorithms that aim to minimize a scene-specific cost of executing a given plan. This approach offers remarkable adaptability, as they can be directly used off-the-shelf for any new scene without needing specific training datasets. However, without a prior understanding of what diverse valid trajectories are and without s… ▽ More Classical motion planning for robotic manipulation includes a set of general algorithms that aim to minimize a scene-specific cost of executing a given plan. This approach offers remarkable adaptability, as they can be directly used off-the-shelf for any new scene without needing specific training datasets. However, without a prior understanding of what diverse valid trajectories are and without specially designed cost functions for a given scene, the overall solutions tend to have low success rates. While deep-learning-based algorithms tremendously improve success rates, they are much harder to adopt without specialized training datasets. We propose EDMP, an Ensemble-of-costs-guided Diffusion for Motion Planning that aims to combine the strengths of classical and deep-learning-based motion planning. Our diffusion-based network is trained on a set of diverse kinematically valid trajectories. Like classical planning, for any new scene at the time of inference, we compute scene-specific costs such as "collision cost" and guide the diffusion to generate valid trajectories that satisfy the scene-specific constraints. Further, instead of a single cost function that may be insufficient in capturing diversity across scenes, we use an ensemble of costs to guide the diffusion process, significantly improving the success rate compared to classical planners. EDMP performs comparably with SOTA deep-learning-based methods while retaining the generalization capabilities primarily associated with classical planners. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 8 pages, 8 figures, submitted to ICRA 2024 (International Conference on Robotics and Automation)

Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2307.14910 [pdf, other]

doi 10.1109/MedComNet58619.2023.10168852

Low-Latency Massive Access with Multicast Wake Up Radio

Authors: Anay Ajit Deshpande, Federico Chiariotti, Andrea Zanella

Abstract: The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled, saving energy. However, polling-based Time Division Multiple Access (TDMA) may significantly increase data transmission delay if pac… ▽ More The use of Wake-Up Radio (WUR) in Internet of Things (IoT) networks can significantly improve their energy efficiency: battery-powered sensors can remain in a low-power (sleep) mode while listening for wake-up messages using their WUR and reactivate only when polled, saving energy. However, polling-based Time Division Multiple Access (TDMA) may significantly increase data transmission delay if packets are generated sporadically, as nodes with no information still need to be polled. In this paper, we examine the effect of multicast polling for WUR-enabled wireless nodes. The idea is to assign nodes to multicast groups so that all nodes in the same group can be solicited by a multicast polling message. This may cause collisions, which can be solved by requesting retransmissions from the involved nodes. We analyze the performance of different multicast polling and retransmission strategies, showing that the optimal approach can significantly reduce the delay over TDMA and ALOHA in low-traffic scenarios while keeping good energy efficiency. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 2023 21st Mediterranean Communication and Computer Networking Conference (MedComNet)

arXiv:2306.10797 [pdf, other]

Variability of echo state network prediction horizon for partially observed dynamical systems

Authors: Ajit Mahata, Reetish Padhi, Amit Apte

Abstract: Study of dynamical systems using partial state observation is an important problem due to its applicability to many real-world systems. We address the problem by studying an echo state network (ESN) framework with partial state input with partial or full state output. Application to the Lorenz system and Chua's oscillator (both numerically simulated and experimental systems) demonstrate the effect… ▽ More Study of dynamical systems using partial state observation is an important problem due to its applicability to many real-world systems. We address the problem by studying an echo state network (ESN) framework with partial state input with partial or full state output. Application to the Lorenz system and Chua's oscillator (both numerically simulated and experimental systems) demonstrate the effectiveness of our method. We show that the ESN, as an autonomous dynamical system, is capable of making short-term predictions up to a few Lyapunov times. However, the prediction horizon has high variability depending on the initial condition-an aspect that we explore in detail using the distribution of the prediction horizon. Further, using a variety of statistical metrics to compare the long-term dynamics of the ESN predictions with numerically simulated or experimental dynamics and observed similar results, we show that the ESN can effectively learn the system's dynamics even when trained with noisy numerical or experimental datasets. Thus, we demonstrate the potential of ESNs to serve as cheap surrogate models for simulating the dynamics of systems where complete observations are unavailable. △ Less

Submitted 5 December, 2023; v1 submitted 19 June, 2023; originally announced June 2023.

arXiv:2306.05495 [pdf, other]

Is Attentional Channel Processing Design Required? Comprehensive Analysis Of Robustness Between Vision Transformers And Fully Attentional Networks

Authors: Abhishri Ajit Medewar, Swanand Ashokrao Kavitkar

Abstract: The robustness testing has been performed for standard CNN models and Vision Transformers, however there is a lack of comprehensive study between the robustness of traditional Vision Transformers without an extra attentional channel design and the latest fully attentional network(FAN) models. So in this paper, we use the ImageNet dataset to compare the robustness of fully attentional network(FAN)… ▽ More The robustness testing has been performed for standard CNN models and Vision Transformers, however there is a lack of comprehensive study between the robustness of traditional Vision Transformers without an extra attentional channel design and the latest fully attentional network(FAN) models. So in this paper, we use the ImageNet dataset to compare the robustness of fully attentional network(FAN) models with traditional Vision Transformers to understand the role of an attentional channel processing design using white box attacks and also study the transferability between the same using black box attacks. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 4 pages, 12 figures

arXiv:2306.04944 [pdf, ps, other]

Colouring planar graphs with a precoloured induced cycle

Authors: Ajit Diwan

Abstract: Let $C$ be a cycle and $f : V(C) \rightarrow \{c_1,c_2,\ldots,c_k\}$ a proper $k$-colouring of $C$ for some $k \ge 4$. We say the colouring $f$ is safe if for any planar graph $G$ in which $C$ is an induced cycle, there exists a proper $k$-colouring $f'$ of $G$ such that $f'(v) = f(v)$ for all $v \in V(C)$. The only safe $4$-colouring is any proper colouring of a triangle. We give a simple necessa… ▽ More Let $C$ be a cycle and $f : V(C) \rightarrow \{c_1,c_2,\ldots,c_k\}$ a proper $k$-colouring of $C$ for some $k \ge 4$. We say the colouring $f$ is safe if for any planar graph $G$ in which $C$ is an induced cycle, there exists a proper $k$-colouring $f'$ of $G$ such that $f'(v) = f(v)$ for all $v \in V(C)$. The only safe $4$-colouring is any proper colouring of a triangle. We give a simple necessary condition for a $k$-colouring of a cycle to be safe and conjecture that it is sufficient for all $k \ge 4$. The sufficiency for $k=4$ follows from the four colour theorem and we prove it for $k = 5$, independent of the four colour theorem. We show that a stronger condition is sufficient for all $k \ge 4$. As a consequence, it follows that any proper $k$-colouring of a cycle that uses at most $k-3$ distinct colours is safe. Also, any proper $k$-colouring of a cycle of length at most $2k-5$ that uses at most $k-1$ distinct colours is safe. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: 18 pages

MSC Class: 05C10; 05C15

arXiv:2305.16820 [pdf, other]

Domain Aligned Prefix Averaging for Domain Generalization in Abstractive Summarization

Authors: Pranav Ajit Nair, Sukomal Pal, Pradeepika Verma

Abstract: Domain generalization is hitherto an underexplored area applied in abstractive summarization. Moreover, most existing works on domain generalization have sophisticated training algorithms. In this paper, we propose a lightweight, weight averaging based, Domain Aligned Prefix Averaging approach to domain generalization for abstractive summarization. Given a number of source domains, our method firs… ▽ More Domain generalization is hitherto an underexplored area applied in abstractive summarization. Moreover, most existing works on domain generalization have sophisticated training algorithms. In this paper, we propose a lightweight, weight averaging based, Domain Aligned Prefix Averaging approach to domain generalization for abstractive summarization. Given a number of source domains, our method first trains a prefix for each one of them. These source prefixes generate summaries for a small number of target domain documents. The similarity of the generated summaries to their corresponding documents is used for calculating weights required to average source prefixes. In DAPA, prefix tuning allows for lightweight finetuning, and weight averaging allows for the computationally efficient addition of new source domains. When evaluated on four diverse summarization domains, DAPA shows comparable or better performance against the baselines, demonstrating the effectiveness of its prefix averaging scheme. △ Less

Submitted 29 May, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

Comments: 13 pages, Accepted to ACL 2023 Findings

arXiv:2305.15108 [pdf, other]

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

Authors: Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann

Abstract: In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is distinct from human vocabulary. Language Models (LMs)… ▽ More In this work, we analyse the role of output vocabulary for text-to-text (T2T) models on the task of SPARQL semantic parsing. We perform experiments within the the context of knowledge graph question answering (KGQA), where the task is to convert questions in natural language to the SPARQL query language. We observe that the query vocabulary is distinct from human vocabulary. Language Models (LMs) are pre-dominantly trained for human language tasks, and hence, if the query vocabulary is replaced with a vocabulary more attuned to the LM tokenizer, the performance of models may improve. We carry out carefully selected vocabulary substitutions on the queries and find absolute gains in the range of 17% on the GrailQA dataset. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted as a short paper to ACL 2023 findings

arXiv:2305.07639 [pdf, other]

Efficient Neural Network based Classification and Outlier Detection for Image Moderation using Compressed Sensing and Group Testing

Authors: Sabyasachi Ghosh, Sanyam Saxena, Ajit Rajwade

Abstract: Popular social media platforms employ neural network based image moderation engines to classify images uploaded on them as having potentially objectionable content. Such moderation engines must answer a large number of queries with heavy computational cost, even though the actual number of images with objectionable content is usually a tiny fraction. Inspired by recent work on Neural Group Testing… ▽ More Popular social media platforms employ neural network based image moderation engines to classify images uploaded on them as having potentially objectionable content. Such moderation engines must answer a large number of queries with heavy computational cost, even though the actual number of images with objectionable content is usually a tiny fraction. Inspired by recent work on Neural Group Testing, we propose an approach which exploits this fact to reduce the overall computational cost of such engines using the technique of Compressed Sensing (CS). We present the quantitative matrix-pooled neural network (QMPNN), which takes as input $n$ images, and a $m \times n$ binary pooling matrix with $m < n$, whose rows indicate $m$ pools of images i.e. selections of $r$ images out of $n$. The QMPNN efficiently outputs the product of this matrix with the unknown sparse binary vector indicating whether each image is objectionable or not, i.e. it outputs the number of objectionable images in each pool. For suitable matrices, this is decoded using CS decoding algorithms to predict which images were objectionable. The computational cost of running the QMPNN and the CS algorithms is significantly lower than the cost of using a neural network with the same number of parameters separately on each image to classify the images, which we demonstrate via extensive experiments. Our technique is inherently resilient to moderate levels of errors in the prediction from the QMPNN. Furthermore, we present pooled deep outlier detection, which brings CS and group testing techniques to deep outlier detection, to provide for the case when the objectionable images do not belong to a set of pre-defined classes. This technique enables efficient automated moderation of off-topic images shared on topical forums dedicated to sharing images of a certain single class, many of which are currently human-moderated. △ Less

Submitted 12 May, 2023; originally announced May 2023.

arXiv:2305.04883 [pdf, other]

Fuzzy Gene Selection and Cancer Classification Based on Deep Learning Model

Authors: Mahmood Khalsan, Mu Mu, Eman Salih Al-Shamery, Lee Machado, Suraj Ajit, Michael Opoku Agyeman

Abstract: Machine learning (ML) approaches have been used to develop highly accurate and efficient applications in many fields including bio-medical science. However, even with advanced ML techniques, cancer classification using gene expression data is still complicated because of the high dimensionality of the datasets employed. We developed a new fuzzy gene selection technique (FGS) to identify informativ… ▽ More Machine learning (ML) approaches have been used to develop highly accurate and efficient applications in many fields including bio-medical science. However, even with advanced ML techniques, cancer classification using gene expression data is still complicated because of the high dimensionality of the datasets employed. We developed a new fuzzy gene selection technique (FGS) to identify informative genes to facilitate cancer classification and reduce the dimensionality of the available gene expression data. Three feature selection methods (Mutual Information, F-ClassIf, and Chi-squared) were evaluated and employed to obtain the score and rank for each gene. Then, using Fuzzification and Defuzzification methods to obtain the best single score for each gene, which aids in the identification of significant genes. Our study applied the fuzzy measures to six gene expression datasets including four Microarray and two RNA-seq datasets for evaluating the proposed algorithm. With our FGS-enhanced method, the cancer classification model achieved 96.5%,96.2%,96%, and 95.9% for accuracy, precision, recall, and f1-score respectively, which is significantly higher than 69.2% accuracy, 57.8% precision, 66% recall, and 58.2% f1-score when the standard MLP method was used. In examining the six datasets that were used, the proposed model demonstrates it's capacity to classify cancer effectively. △ Less

Submitted 4 May, 2023; originally announced May 2023.

Comments: Journal of Intelligent Information Systems (25,17)

arXiv:2304.11507 [pdf, other]

Machine learning framework for end-to-end implementation of Incident duration prediction

Authors: Smrithi Ajit, Varsha R Mouli, Skylar Knickerbocker, Jonathan S. Wood

Abstract: Traffic congestion caused by non-recurring incidents such as vehicle crashes and debris is a key issue for Traffic Management Centers (TMCs). Clearing incidents in a timely manner is essential for improving safety and reducing delays and emissions for the traveling public. However, TMCs and other responders face a challenge in predicting the duration of incidents (until the roadway is clear), maki… ▽ More Traffic congestion caused by non-recurring incidents such as vehicle crashes and debris is a key issue for Traffic Management Centers (TMCs). Clearing incidents in a timely manner is essential for improving safety and reducing delays and emissions for the traveling public. However, TMCs and other responders face a challenge in predicting the duration of incidents (until the roadway is clear), making decisions of what resources to deploy difficult. To address this problem, this research developed an analytical framework and end-to-end machine-learning solution for predicting incident duration based on information available as soon as an incident report is received. Quality predictions of incident duration can help TMCs and other responders take a proactive approach in deploying responder services such as tow trucks, maintenance crews or activating alternative routes. The predictions use a combination of classification and regression machine learning modules. The performance of the developed solution has been evaluated based on the Mean Absolute Error (MAE), or deviation from the actual incident duration as well as Area Under the Curve (AUC) and Mean Absolute Percentage Error (MAPE). The results showed that the framework significantly improved incident duration prediction compared to methods from previous research. △ Less

Submitted 22 April, 2023; originally announced April 2023.

arXiv:2304.11277 [pdf, other]

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

Authors: Yanli Zhao, Andrew Gu, Rohan Varma, Liang Luo, Chien-Chin Huang, Min Xu, Less Wright, Hamid Shojanazeri, Myle Ott, Sam Shleifer, Alban Desmaison, Can Balioglu, Pritam Damania, Bernard Nguyen, Geeta Chauhan, Yuchen Hao, Ajit Mathews, Shen Li

Abstract: It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit tech… ▽ More It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit technical barrier for the wider community to access and leverage these technologies. In this paper, we introduce PyTorch Fully Sharded Data Parallel (FSDP) as an industry-grade solution for large model training. FSDP has been closely co-designed with several key PyTorch core components including Tensor implementation, dispatcher system, and CUDA memory caching allocator, to provide non-intrusive user experiences and high training efficiency. Additionally, FSDP natively incorporates a range of techniques and settings to optimize resource utilization across a variety of hardware configurations. The experimental results demonstrate that FSDP is capable of achieving comparable performance to Distributed Data Parallel while providing support for significantly larger models with near-linear scalability in terms of TFLOPS. △ Less

Submitted 12 September, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.08769 [pdf, ps, other]

Cooperative Multi-Agent Reinforcement Learning for Inventory Management

Authors: Madhav Khirwar, Karthik S. Gurumoorthy, Ankit Ajit Jain, Shantala Manchenahally

Abstract: With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the enviro… ▽ More With Reinforcement Learning (RL) for inventory management (IM) being a nascent field of research, approaches tend to be limited to simple, linear environments with implementations that are minor modifications of off-the-shelf RL algorithms. Scaling these simplistic environments to a real-world supply chain comes with a few challenges such as: minimizing the computational requirements of the environment, specifying agent configurations that are representative of dynamics at real world stores and warehouses, and specifying a reward framework that encourages desirable behavior across the whole supply chain. In this work, we present a system with a custom GPU-parallelized environment that consists of one warehouse and multiple stores, a novel architecture for agent-environment dynamics incorporating enhanced state and action spaces, and a shared reward specification that seeks to optimize for a large retailer's supply chain needs. Each vertex in the supply chain graph is an independent agent that, based on its own inventory, able to place replenishment orders to the vertex upstream. The warehouse agent, aside from placing orders from the supplier, has the special property of also being able to constrain replenishment to stores downstream, which results in it learning an additional allocation sub-policy. We achieve a system that outperforms standard inventory control policies such as a base-stock policy and other RL-based specifications for 1 product, and lay out a future direction of work for multiple products. △ Less

Submitted 18 April, 2023; originally announced April 2023.

Comments: 14 pages, 5 figures

arXiv:2304.08740 [pdf, other]

Estimating Joint Probability Distribution With Low-Rank Tensor Decomposition, Radon Transforms and Dictionaries

Authors: Pranava Singhal, Waqar Mirza, Ajit Rajwade, Karthik S. Gurumoorthy

Abstract: In this paper, we describe a method for estimating the joint probability density from data samples by assuming that the underlying distribution can be decomposed as a mixture of product densities with few mixture components. Prior works have used such a decomposition to estimate the joint density from lower-dimensional marginals, which can be estimated more reliably with the same number of samples… ▽ More In this paper, we describe a method for estimating the joint probability density from data samples by assuming that the underlying distribution can be decomposed as a mixture of product densities with few mixture components. Prior works have used such a decomposition to estimate the joint density from lower-dimensional marginals, which can be estimated more reliably with the same number of samples. We combine two key ideas: dictionaries to represent 1-D densities, and random projections to estimate the joint distribution from 1-D marginals, explored separately in prior work. Our algorithm benefits from improved sample complexity over the previous dictionary-based approach by using 1-D marginals for reconstruction. We evaluate the performance of our method on estimating synthetic probability densities and compare it with the previous dictionary-based approach and Gaussian Mixture Models (GMMs). Our algorithm outperforms these other approaches in all the experimental settings. △ Less

Submitted 18 April, 2023; originally announced April 2023.

MSC Class: 62G07

arXiv:2304.06376 [pdf, other]

Signal Reconstruction from Samples at Unknown Locations with Application to 2D Unknown View Tomography

Authors: Sheel Shah, Kaishva Shah, Karthik S. Gurumoorthy, Ajit Rajwade

Abstract: It is well known that a band-limited signal can be reconstructed from its uniformly spaced samples if the sampling rate is sufficiently high. More recently, it has been proved that one can reconstruct a 1D band-limited signal even if the exact sample locations are unknown, but given a uniform distribution of the sample locations and their ordering in 1D. In this work, we extend the analytical erro… ▽ More It is well known that a band-limited signal can be reconstructed from its uniformly spaced samples if the sampling rate is sufficiently high. More recently, it has been proved that one can reconstruct a 1D band-limited signal even if the exact sample locations are unknown, but given a uniform distribution of the sample locations and their ordering in 1D. In this work, we extend the analytical error bounds in such scenarios for quasi-bandlimited (QBL) signals, and for the case of arbitrary but known sampling distributions. We also prove that such reconstruction methods are resilient to a certain proportion of errors in the specification of the sample location ordering. We then express the problem of tomographic reconstruction of 2D images from 1D Radon projections under unknown angles (2D UVT) with known angle distribution, as a special case for reconstruction of QBL signals from samples at unknown locations with known distribution. Building upon our theoretical background, we present asymptotic bounds for 2D QBL image reconstruction from 1D Radon projections in the unknown angles setting, and present an extensive set of simulations to verify these bounds in varied parameter regimes. To the best of our knowledge, this is the first piece of work to perform such an analysis for 2D UVT and explicitly relate it to advances in sampling theory, even though the associated reconstruction algorithms have been known for a long time. △ Less

Submitted 18 December, 2024; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: This is a preprint of a paper accepted to Signal Processing (Elsevier)

arXiv:2304.00086 [pdf, other]

Machine Learning for Economics Research: When What and How?

Authors: Ajit Desai

Abstract: This article provides a curated review of selected papers published in prominent economics journals that use machine learning (ML) tools for research and policy analysis. The review focuses on three key questions: (1) when ML is used in economics, (2) what ML models are commonly preferred, and (3) how they are used for economic applications. The review highlights that ML is particularly used to pr… ▽ More This article provides a curated review of selected papers published in prominent economics journals that use machine learning (ML) tools for research and policy analysis. The review focuses on three key questions: (1) when ML is used in economics, (2) what ML models are commonly preferred, and (3) how they are used for economic applications. The review highlights that ML is particularly used to process nontraditional and unstructured data, capture strong nonlinearity, and improve prediction accuracy. Deep learning models are suitable for nontraditional data, whereas ensemble learning models are preferred for traditional datasets. While traditional econometric models may suffice for analyzing low-complexity data, the increasing complexity of economic data due to rapid digitalization and the growing literature suggests that ML is becoming an essential addition to the econometrician's toolbox. △ Less

Submitted 20 April, 2023; v1 submitted 31 March, 2023; originally announced April 2023.

arXiv:2303.13284 [pdf, other]

GETT-QA: Graph Embedding based T2T Transformer for Knowledge Graph Question Answering

Authors: Debayan Banerjee, Pranav Ajit Nair, Ricardo Usbeck, Chris Biemann

Abstract: In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces correspondin… ▽ More In this work, we present an end-to-end Knowledge Graph Question Answering (KGQA) system named GETT-QA. GETT-QA uses T5, a popular text-to-text pre-trained language model. The model takes a question in natural language as input and produces a simpler form of the intended SPARQL query. In the simpler form, the model does not directly produce entity and relation IDs. Instead, it produces corresponding entity and relation labels. The labels are grounded to KG entity and relation IDs in a subsequent step. To further improve the results, we instruct the model to produce a truncated version of the KG embedding for each entity. The truncated KG embedding enables a finer search for disambiguation purposes. We find that T5 is able to learn the truncated KG embeddings without any change of loss function, improving KGQA performance. As a result, we report strong results for LC-QuAD 2.0 and SimpleQuestions-Wikidata datasets on end-to-end KGQA over Wikidata. △ Less

Submitted 28 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: 16 pages single column format accepted at ESWC 2023 research track

arXiv:2303.06277 [pdf, other]

SPOTR: Spatio-temporal Pose Transformers for Human Motion Prediction

Authors: Avinash Ajit Nargund, Misha Sra

Abstract: 3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applic… ▽ More 3D human motion prediction is a research area of high significance and a challenge in computer vision. It is useful for the design of many applications including robotics and autonomous driving. Traditionally, autogregressive models have been used to predict human motion. However, these models have high computation needs and error accumulation that make it difficult to use them for realtime applications. In this paper, we present a non-autogressive model for human motion prediction. We focus on learning spatio-temporal representations non-autoregressively for generation of plausible future motions. We propose a novel architecture that leverages the recently proposed Transformers. Human motion involves complex spatio-temporal dynamics with joints affecting the position and rotation of each other even though they are not connected directly. The proposed model extracts these dynamics using both convolutions and the self-attention mechanism. Using specialized spatial and temporal self-attention to augment the features extracted through convolution allows our model to generate spatio-temporally coherent predictions in parallel independent of the activity. Our contributions are threefold: (i) we frame human motion prediction as a sequence-to-sequence problem and propose a non-autoregressive Transformer to forecast a sequence of poses in parallel; (ii) our method is activity agnostic; (iii) we show that despite its simplicity, our approach is able to make accurate predictions, achieving better or comparable results compared to the state-of-the-art on two public datasets, with far fewer parameters and much faster inference. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Showing 1–50 of 121 results for author: Ajit