-
Deep Learning-Assisted Detection of Sarcopenia in Cross-Sectional Computed Tomography Imaging
Authors:
Manish Bhardwaj,
Huizhi Liang,
Ashwin Sivaharan,
Sandip Nandhra,
Vaclav Snasel,
Tamer El-Sayed,
Varun Ojha
Abstract:
Sarcopenia is a progressive loss of muscle mass and function linked to poor surgical outcomes such as prolonged hospital stays, impaired mobility, and increased mortality. Although it can be assessed through cross-sectional imaging by measuring skeletal muscle area (SMA), the process is time-consuming and adds to clinical workloads, limiting timely detection and management; however, this process c…
▽ More
Sarcopenia is a progressive loss of muscle mass and function linked to poor surgical outcomes such as prolonged hospital stays, impaired mobility, and increased mortality. Although it can be assessed through cross-sectional imaging by measuring skeletal muscle area (SMA), the process is time-consuming and adds to clinical workloads, limiting timely detection and management; however, this process could become more efficient and scalable with the assistance of artificial intelligence applications. This paper presents high-quality three-dimensional cross-sectional computed tomography (CT) images of patients with sarcopenia collected at the Freeman Hospital, Newcastle upon Tyne Hospitals NHS Foundation Trust. Expert clinicians manually annotated the SMA at the third lumbar vertebra, generating precise segmentation masks. We develop deep-learning models to measure SMA in CT images and automate this task. Our methodology employed transfer learning and self-supervised learning approaches using labelled and unlabeled CT scan datasets. While we developed qualitative assessment models for detecting sarcopenia, we observed that the quantitative assessment of SMA is more precise and informative. This approach also mitigates the issue of class imbalance and limited data availability. Our model predicted the SMA, on average, with an error of +-3 percentage points against the manually measured SMA. The average dice similarity coefficient of the predicted masks was 93%. Our results, therefore, show a pathway to full automation of sarcopenia assessment and detection.
△ Less
Submitted 24 August, 2025;
originally announced August 2025.
-
Interpretable Emergent Language Using Inter-Agent Transformers
Authors:
Mannan Bhardwaj
Abstract:
This paper explores the emergence of language in multi-agent reinforcement learning (MARL) using transformers. Existing methods such as RIAL, DIAL, and CommNet enable agent communication but lack interpretability. We propose Differentiable Inter-Agent Transformers (DIAT), which leverage self-attention to learn symbolic, human-understandable communication protocols. Through experiments, DIAT demons…
▽ More
This paper explores the emergence of language in multi-agent reinforcement learning (MARL) using transformers. Existing methods such as RIAL, DIAL, and CommNet enable agent communication but lack interpretability. We propose Differentiable Inter-Agent Transformers (DIAT), which leverage self-attention to learn symbolic, human-understandable communication protocols. Through experiments, DIAT demonstrates the ability to encode observations into interpretable vocabularies and meaningful embeddings, effectively solving cooperative tasks. These results highlight the potential of DIAT for interpretable communication in complex multi-agent environments.
△ Less
Submitted 4 May, 2025;
originally announced May 2025.
-
Co-Change Graph Entropy: A New Process Metric for Defect Prediction
Authors:
Ethari Hrishikesh,
Amit Kumar,
Meher Bhardwaj,
Sonali Agarwal
Abstract:
Process metrics, valued for their language independence and ease of collection, have been shown to outperform product metrics in defect prediction. Among these, change entropy (Hassan, 2009) is widely used at the file level and has proven highly effective. Additionally, past research suggests that co-change patterns provide valuable insights into software quality. Building on these findings, we in…
▽ More
Process metrics, valued for their language independence and ease of collection, have been shown to outperform product metrics in defect prediction. Among these, change entropy (Hassan, 2009) is widely used at the file level and has proven highly effective. Additionally, past research suggests that co-change patterns provide valuable insights into software quality. Building on these findings, we introduce Co-Change Graph Entropy, a novel metric that models co-changes as a graph to quantify co-change scattering. Experiments on eight Apache projects reveal a significant correlation between co-change entropy and defect counts at the file level, with a Pearson correlation coefficient of up to 0.54. In filelevel defect classification, replacing change entropy with co-change entropy improves AUROC in 72.5% of cases and MCC in 62.5% across 40 experimental settings (five machine learning classifiers and eight projects), though these improvements are not statistically significant. However, when co-change entropy is combined with change entropy, AUROC improves in 82.5% of cases and MCC in 65%, with statistically significant gains confirmed via the Friedman test followed by the post-hoc Nemenyi test. These results indicate that co-change entropy complements change entropy, significantly enhancing defect classification performance and underscoring its practical importance in defect prediction.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Dynamic Non-Prehensile Object Transport via Model-Predictive Reinforcement Learning
Authors:
Neel Jawale,
Byron Boots,
Balakumar Sundaralingam,
Mohak Bhardwaj
Abstract:
We investigate the problem of teaching a robot manipulator to perform dynamic non-prehensile object transport, also known as the `robot waiter' task, from a limited set of real-world demonstrations. We propose an approach that combines batch reinforcement learning (RL) with model-predictive control (MPC) by pretraining an ensemble of value functions from demonstration data, and utilizing them onli…
▽ More
We investigate the problem of teaching a robot manipulator to perform dynamic non-prehensile object transport, also known as the `robot waiter' task, from a limited set of real-world demonstrations. We propose an approach that combines batch reinforcement learning (RL) with model-predictive control (MPC) by pretraining an ensemble of value functions from demonstration data, and utilizing them online within an uncertainty-aware MPC scheme to ensure robustness to limited data coverage. Our approach is straightforward to integrate with off-the-shelf MPC frameworks and enables learning solely from task space demonstrations with sparsely labeled transitions, while leveraging MPC to ensure smooth joint space motions and constraint satisfaction. We validate the proposed approach through extensive simulated and real-world experiments on a Franka Panda robot performing the robot waiter task and demonstrate robust deployment of value functions learned from 50-100 demonstrations. Furthermore, our approach enables generalization to novel objects not seen during training and can improve upon suboptimal demonstrations. We believe that such a framework can reduce the burden of providing extensive demonstrations and facilitate rapid training of robot manipulators to perform non-prehensile manipulation tasks. Project videos and supplementary material can be found at: https://sites.google.com/view/cvmpc.
△ Less
Submitted 26 November, 2024;
originally announced December 2024.
-
EzSQL: An SQL intermediate representation for improving SQL-to-text Generation
Authors:
Meher Bhardwaj,
Hrishikesh Ethari,
Dennis Singh Moirangthem
Abstract:
The SQL-to-text generation task traditionally uses template base, Seq2Seq, tree-to-sequence, and graph-to-sequence models. Recent models take advantage of pre-trained generative language models for this task in the Seq2Seq framework. However, treating SQL as a sequence of inputs to the pre-trained models is not optimal. In this work, we put forward a new SQL intermediate representation called EzSQ…
▽ More
The SQL-to-text generation task traditionally uses template base, Seq2Seq, tree-to-sequence, and graph-to-sequence models. Recent models take advantage of pre-trained generative language models for this task in the Seq2Seq framework. However, treating SQL as a sequence of inputs to the pre-trained models is not optimal. In this work, we put forward a new SQL intermediate representation called EzSQL to align SQL with the natural language text sequence. EzSQL simplifies the SQL queries and brings them closer to natural language text by modifying operators and keywords, which can usually be described in natural language. EzSQL also removes the need for set operators. Our proposed SQL-to-text generation model uses EzSQL as the input to a pre-trained generative language model for generating the text descriptions. We demonstrate that our model is an effective state-of-the-art method to generate text narrations from SQL queries on the WikiSQL and Spider datasets. We also show that by generating pretraining data using our SQL-to-text generation model, we can enhance the performance of Text-to-SQL parsers.
△ Less
Submitted 9 April, 2025; v1 submitted 28 November, 2024;
originally announced November 2024.
-
Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning
Authors:
Mohak Bhardwaj,
Thomas Lampe,
Michael Neunert,
Francesco Romano,
Abbas Abdolmaleki,
Arunkumar Byravan,
Markus Wulfmeier,
Martin Riedmiller,
Jonas Buchli
Abstract:
Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work,…
▽ More
Recent advances in real-world applications of reinforcement learning (RL) have relied on the ability to accurately simulate systems at scale. However, domains such as fluid dynamical systems exhibit complex dynamic phenomena that are hard to simulate at high integration rates, limiting the direct application of modern deep RL algorithms to often expensive or safety critical hardware. In this work, we introduce "Box o Flows", a novel benchtop experimental control system for systematically evaluating RL algorithms in dynamic real-world scenarios. We describe the key components of the Box o Flows, and through a series of experiments demonstrate how state-of-the-art model-free RL algorithms can synthesize a variety of complex behaviors via simple reward specifications. Furthermore, we explore the role of offline RL in data-efficient hypothesis testing by reusing past experiences. We believe that the insights gained from this preliminary study and the availability of systems like the Box o Flows support the way forward for developing systematic RL algorithms that can be generally applied to complex, dynamical systems. Supplementary material and videos of experiments are available at https://sites.google.com/view/box-o-flows/home.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
An innovative Deep Learning Based Approach for Accurate Agricultural Crop Price Prediction
Authors:
Mayank Ratan Bhardwaj,
Jaydeep Pawar,
Abhijnya Bhat,
Deepanshu,
Inavamsi Enaganti,
Kartik Sagar,
Y. Narahari
Abstract:
Accurate prediction of agricultural crop prices is a crucial input for decision-making by various stakeholders in agriculture: farmers, consumers, retailers, wholesalers, and the Government. These decisions have significant implications including, most importantly, the economic well-being of the farmers. In this paper, our objective is to accurately predict crop prices using historical price infor…
▽ More
Accurate prediction of agricultural crop prices is a crucial input for decision-making by various stakeholders in agriculture: farmers, consumers, retailers, wholesalers, and the Government. These decisions have significant implications including, most importantly, the economic well-being of the farmers. In this paper, our objective is to accurately predict crop prices using historical price information, climate conditions, soil type, location, and other key determinants of crop prices. This is a technically challenging problem, which has been attempted before. In this paper, we propose an innovative deep learning based approach to achieve increased accuracy in price prediction. The proposed approach uses graph neural networks (GNNs) in conjunction with a standard convolutional neural network (CNN) model to exploit geospatial dependencies in prices. Our approach works well with noisy legacy data and produces a performance that is at least 20% better than the results available in the literature. We are able to predict prices up to 30 days ahead. We choose two vegetables, potato (stable price behavior) and tomato (volatile price behavior) and work with noisy public data available from Indian agricultural markets.
△ Less
Submitted 15 April, 2023;
originally announced April 2023.
-
Designing Fair, Cost-optimal Auctions based on Deep Learning for Procuring Agricultural Inputs through Farmer Collectives
Authors:
Mayank Ratan Bhardwaj,
Bazil Ahmed,
Prathik Diwakar,
Ganesh Ghalme,
Y. Narahari
Abstract:
Procuring agricultural inputs (agri-inputs for short) such as seeds, fertilizers, and pesticides, at desired quality levels and at affordable cost, forms a critical component of agricultural input operations. This is a particularly challenging problem being faced by small and marginal farmers in any emerging economy. Farmer collectives (FCs), which are cooperative societies of farmers, offer an ex…
▽ More
Procuring agricultural inputs (agri-inputs for short) such as seeds, fertilizers, and pesticides, at desired quality levels and at affordable cost, forms a critical component of agricultural input operations. This is a particularly challenging problem being faced by small and marginal farmers in any emerging economy. Farmer collectives (FCs), which are cooperative societies of farmers, offer an excellent prospect for enabling cost-effective procurement of inputs with assured quality to the farmers. In this paper, our objective is to design sound, explainable mechanisms by which an FC will be able to procure agri-inputs in bulk and distribute the inputs procured to the individual farmers who are members of the FC. In the methodology proposed here, an FC engages qualified suppliers in a competitive, volume discount procurement auction in which the suppliers specify price discounts based on volumes supplied. The desiderata of properties for such an auction include: minimization of the total cost of procurement; incentive compatibility; individual rationality; fairness; and other business constraints. An auction satisfying all these properties is analytically infeasible and a key contribution of this paper is to develop a deep learning based approach to design such an auction. We use two realistic, stylized case studies from chili seeds procurement and a popular pesticide procurement to demonstrate the efficacy of these auctions.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Adversarial Model for Offline Reinforcement Learning
Authors:
Mohak Bhardwaj,
Tengyang Xie,
Byron Boots,
Nan Jiang,
Ching-An Cheng
Abstract:
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. ARMOR is designed to optimize policies for the worst-case performance relative to the reference policy through adversarially training a Markov d…
▽ More
We propose a novel model-based offline Reinforcement Learning (RL) framework, called Adversarial Model for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary reference policy regardless of data coverage. ARMOR is designed to optimize policies for the worst-case performance relative to the reference policy through adversarially training a Markov decision process model. In theory, we prove that ARMOR, with a well-tuned hyperparameter, can compete with the best policy within data coverage when the reference policy is supported by the data. At the same time, ARMOR is robust to hyperparameter choices: the policy learned by ARMOR, with "any" admissible hyperparameter, would never degrade the performance of the reference policy, even when the reference policy is not covered by the dataset. To validate these properties in practice, we design a scalable implementation of ARMOR, which by adversarial training, can optimize policies without using model ensembles in contrast to typical model-based methods. We show that ARMOR achieves competent performance with both state-of-the-art offline model-free and model-based RL algorithms and can robustly improve the reference policy over various hyperparameter choices.
△ Less
Submitted 24 December, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
ARMOR: A Model-based Framework for Improving Arbitrary Baseline Policies with Offline Data
Authors:
Tengyang Xie,
Mohak Bhardwaj,
Nan Jiang,
Ching-An Cheng
Abstract:
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage. Based on the concept of relative pessimism, ARMOR is designed to optimize for the worst-case relative performance when facing uncertainty. In theory, we prove that the lea…
▽ More
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforcement Learning (ARMOR), which can robustly learn policies to improve upon an arbitrary baseline policy regardless of data coverage. Based on the concept of relative pessimism, ARMOR is designed to optimize for the worst-case relative performance when facing uncertainty. In theory, we prove that the learned policy of ARMOR never degrades the performance of the baseline policy with any admissible hyperparameter, and can learn to compete with the best policy within data coverage when the hyperparameter is well tuned, and the baseline policy is supported by the data. Such a robust policy improvement property makes ARMOR especially suitable for building real-world learning systems, because in practice ensuring no performance degradation is imperative before considering any benefit learning can bring.
△ Less
Submitted 8 November, 2022;
originally announced November 2022.
-
Maxmin Participatory Budgeting
Authors:
Gogulapati Sreedurga,
Mayank Ratan Bhardwaj,
Y. Narahari
Abstract:
Participatory Budgeting (PB) is a popular voting method by which a limited budget is divided among a set of projects, based on the preferences of voters over the projects. PB is broadly categorised as divisible PB (if the projects are fractionally implementable) and indivisible PB (if the projects are atomic). Egalitarianism, an important objective in PB, has not received much attention in the con…
▽ More
Participatory Budgeting (PB) is a popular voting method by which a limited budget is divided among a set of projects, based on the preferences of voters over the projects. PB is broadly categorised as divisible PB (if the projects are fractionally implementable) and indivisible PB (if the projects are atomic). Egalitarianism, an important objective in PB, has not received much attention in the context of indivisible PB. This paper addresses this gap through a detailed study of a natural egalitarian rule, Maxmin Participatory Budgeting (MPB), in the context of indivisible PB. Our study is in two parts: (1) computational (2) axiomatic. In the first part, we prove that MPB is computationally hard and give pseudo-polynomial time and polynomial-time algorithms when parameterized by certain well-motivated parameters. We propose an algorithm that achieves for MPB, additive approximation guarantees for restricted spaces of instances and empirically show that our algorithm in fact gives exact optimal solutions on real-world PB datasets. We also establish an upper bound on the approximation ratio achievable for MPB by the family of exhaustive strategy-proof PB algorithms. In the second part, we undertake an axiomatic study of the MPB rule by generalizing known axioms in the literature. Our study leads to the proposal of a new axiom, maximal coverage, which captures fairness aspects. We prove that MPB satisfies maximal coverage.
△ Less
Submitted 29 April, 2022;
originally announced April 2022.
-
Leveraging Experience in Lazy Search
Authors:
Mohak Bhardwaj,
Sanjiban Choudhury,
Byron Boots,
Siddhartha Srinivasa
Abstract:
Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a…
▽ More
Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics. We further provide a novel theoretical analysis of lazy search in a Bayesian framework as well as regret guarantees on our imitation learning based approach to motion planning.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation
Authors:
Mohak Bhardwaj,
Balakumar Sundaralingam,
Arsalan Mousavian,
Nathan Ratliff,
Dieter Fox,
Fabio Ramos,
Byron Boots
Abstract:
Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running…
▽ More
Sampling-based model-predictive control (MPC) is a promising tool for feedback control of robots with complex, non-smooth dynamics, and cost functions. However, the computationally demanding nature of sampling-based MPC algorithms has been a key bottleneck in their application to high-dimensional robotic manipulation problems in the real world. Previous methods have addressed this issue by running MPC in the task space while relying on a low-level operational space controller for joint control. However, by not using the joint space of the robot in the MPC formulation, existing methods cannot directly account for non-task space related constraints such as avoiding joint limits, singular configurations, and link collisions. In this paper, we develop a system for fast, joint space sampling-based MPC for manipulators that is efficiently parallelized using GPUs. Our approach can handle task and joint space constraints while taking less than 8ms~(125Hz) to compute the next control command. Further, our method can tightly integrate perception into the control problem by utilizing learned cost functions from raw sensor data. We validate our approach by deploying it on a Franka Panda robot for a variety of dynamic manipulation tasks. We study the effect of different cost formulations and MPC parameters on the synthesized behavior and provide key insights that pave the way for the application of sampling-based MPC for manipulators in a principled manner. We also provide highly optimized, open-source code to be used by the wider robot learning and control community. Videos of experiments can be found at: https://sites.google.com/view/manipulation-mpc
△ Less
Submitted 14 September, 2021; v1 submitted 27 April, 2021;
originally announced April 2021.
-
Blending MPC & Value Function Approximation for Efficient Reinforcement Learning
Authors:
Mohak Bhardwaj,
Sanjiban Choudhury,
Byron Boots
Abstract:
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models.…
▽ More
Model-Predictive Control (MPC) is a powerful tool for controlling complex, real-world systems that uses a model to make predictions about future behavior. For each state encountered, MPC solves an online optimization problem to choose a control action that will minimize future cost. This is a surprisingly effective strategy, but real-time performance requirements warrant the use of simple models. If the model is not sufficiently accurate, then the resulting controller can be biased, limiting performance. We present a framework for improving on MPC with model-free reinforcement learning (RL). The key insight is to view MPC as constructing a series of local Q-function approximations. We show that by using a parameter $λ$, similar to the trace decay parameter in TD($λ$), we can systematically trade-off learned value estimates against the local Q-function approximations. We present a theoretical analysis that shows how error from inaccurate models in MPC and value function estimation in RL can be balanced. We further propose an algorithm that changes $λ$ over time to reduce the dependence on MPC as our estimates of the value function improve, and test the efficacy our approach on challenging high-dimensional manipulation tasks with biased models in simulation. We demonstrate that our approach can obtain performance comparable with MPC with access to true dynamics even under severe model bias and is more sample efficient as compared to model-free RL.
△ Less
Submitted 13 April, 2021; v1 submitted 10 December, 2020;
originally announced December 2020.
-
Hostility Detection Dataset in Hindi
Authors:
Mohit Bhardwaj,
Md Shad Akhtar,
Asif Ekbal,
Amitava Das,
Tanmoy Chakraborty
Abstract:
In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We rel…
▽ More
In this paper, we present a novel hostility detection dataset in Hindi language. We collect and manually annotate ~8200 online posts. The annotated dataset covers four hostility dimensions: fake news, hate speech, offensive, and defamation posts, along with a non-hostile label. The hostile posts are also considered for multi-label tags due to a significant overlap among the hostile classes. We release this dataset as part of the CONSTRAINT-2021 shared task on hostile post detection.
△ Less
Submitted 6 November, 2020;
originally announced November 2020.
-
No Rumours Please! A Multi-Indic-Lingual Approach for COVID Fake-Tweet Detection
Authors:
Debanjana Kar,
Mohit Bhardwaj,
Suranjana Samanta,
Amar Prakash Azad
Abstract:
The sudden widespread menace created by the present global pandemic COVID-19 has had an unprecedented effect on our lives. Man-kind is going through humongous fear and dependence on social media like never before. Fear inevitably leads to panic, speculations, and the spread of misinformation. Many governments have taken measures to curb the spread of such misinformation for public well being. Besi…
▽ More
The sudden widespread menace created by the present global pandemic COVID-19 has had an unprecedented effect on our lives. Man-kind is going through humongous fear and dependence on social media like never before. Fear inevitably leads to panic, speculations, and the spread of misinformation. Many governments have taken measures to curb the spread of such misinformation for public well being. Besides global measures, to have effective outreach, systems for demographically local languages have an important role to play in this effort. Towards this, we propose an approach to detect fake news about COVID-19 early on from social media, such as tweets, for multiple Indic-Languages besides English. In addition, we also create an annotated dataset of Hindi and Bengali tweet for fake news detection. We propose a BERT based model augmented with additional relevant features extracted from Twitter to identify fake tweets. To expand our approach to multiple Indic languages, we resort to mBERT based model which is fine-tuned over created dataset in Hindi and Bengali. We also propose a zero-shot learning approach to alleviate the data scarcity issue for such low resource languages. Through rigorous experiments, we show that our approach reaches around 89% F-Score in fake tweet detection which supercedes the state-of-the-art (SOTA) results. Moreover, we establish the first benchmark for two Indic-Languages, Hindi and Bengali. Using our annotated data, our model achieves about 79% F-Score in Hindi and 81% F-Score for Bengali Tweets. Our zero-shot model achieves about 81% F-Score in Hindi and 78% F-Score for Bengali Tweets without any annotated data, which clearly indicates the efficacy of our approach.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Information Theoretic Model Predictive Q-Learning
Authors:
Mohak Bhardwaj,
Ankur Handa,
Dieter Fox,
Byron Boots
Abstract:
Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both assumptions can be violated in real world problems such as robotics, where querying the system can be expensive and real-world dynamics can be difficult to model. In contrast to RL, Model Predictive Control (MPC) al…
▽ More
Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both assumptions can be violated in real world problems such as robotics, where querying the system can be expensive and real-world dynamics can be difficult to model. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real systems in a systematic manner.
△ Less
Submitted 5 May, 2020; v1 submitted 30 December, 2019;
originally announced January 2020.
-
Differentiable Gaussian Process Motion Planning
Authors:
Mohak Bhardwaj,
Byron Boots,
Mustafa Mukadam
Abstract:
Modern trajectory optimization based approaches to motion planning are fast, easy to implement, and effective on a wide range of robotics tasks. However, trajectory optimization algorithms have parameters that are typically set in advance (and rarely discussed in detail). Setting these parameters properly can have a significant impact on the practical performance of the algorithm, sometimes making…
▽ More
Modern trajectory optimization based approaches to motion planning are fast, easy to implement, and effective on a wide range of robotics tasks. However, trajectory optimization algorithms have parameters that are typically set in advance (and rarely discussed in detail). Setting these parameters properly can have a significant impact on the practical performance of the algorithm, sometimes making the difference between finding a feasible plan or failing at the task entirely. We propose a method for leveraging past experience to learn how to automatically adapt the parameters of Gaussian Process Motion Planning (GPMP) algorithms. Specifically, we propose a differentiable extension to the GPMP2 algorithm, so that it can be trained end-to-end from data. We perform several experiments that validate our algorithm and illustrate the benefits of our proposed learning-based approach to motion planning.
△ Less
Submitted 11 March, 2020; v1 submitted 22 July, 2019;
originally announced July 2019.
-
Leveraging Experience in Lazy Search
Authors:
Mohak Bhardwaj,
Sanjiban Choudhury,
Byron Boots,
Siddhartha Srinivasa
Abstract:
Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a…
▽ More
Lazy graph search algorithms are efficient at solving motion planning problems where edge evaluation is the computational bottleneck. These algorithms work by lazily computing the shortest potentially feasible path, evaluating edges along that path, and repeating until a feasible path is found. The order in which edges are selected is critical to minimizing the total number of edge evaluations: a good edge selector chooses edges that are not only likely to be invalid, but also eliminates future paths from consideration. We wish to learn such a selector by leveraging prior experience. We formulate this problem as a Markov Decision Process (MDP) on the state of the search problem. While solving this large MDP is generally intractable, we show that we can compute oracular selectors that can solve the MDP during training. With access to such oracles, we use imitation learning to find effective policies. If new search problems are sufficiently similar to problems solved during training, the learned policy will choose a good edge evaluation ordering and solve the motion planning problem quickly. We evaluate our algorithms on a wide range of 2D and 7D problems and show that the learned selector outperforms baseline commonly used heuristics.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
Data-driven Planning via Imitation Learning
Authors:
Sanjiban Choudhury,
Mohak Bhardwaj,
Sankalp Arora,
Ashish Kapoor,
Gireeja Ranade,
Sebastian Scherer,
Debadeepta Dey
Abstract:
Robot planning is the process of selecting a sequence of actions that optimize for a task specific objective. The optimal solutions to such tasks are heavily influenced by the implicit structure in the environment, i.e. the configuration of objects in the world. State-of-the-art planning approaches, however, do not exploit this structure, thereby expending valuable effort searching the action spac…
▽ More
Robot planning is the process of selecting a sequence of actions that optimize for a task specific objective. The optimal solutions to such tasks are heavily influenced by the implicit structure in the environment, i.e. the configuration of objects in the world. State-of-the-art planning approaches, however, do not exploit this structure, thereby expending valuable effort searching the action space instead of focusing on potentially good actions. In this paper, we address the problem of enabling planners to adapt their search strategies by inferring such good actions in an efficient manner using only the information uncovered by the search up until that time. We formulate this as a problem of sequential decision making under uncertainty where at a given iteration a planning policy must map the state of the search to a planning action. Unfortunately, the training process for such partial information based policies is slow to converge and susceptible to poor local minima. Our key insight is that if we could fully observe the underlying world map, we would easily be able to disambiguate between good and bad actions. We hence present a novel data-driven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle - an oracle that at train time has full knowledge about the world map and can compute optimal decisions. We leverage the fact that for planning problems, such oracles can be efficiently computed and derive performance guarantees for the learnt policy. We examine two important domains that rely on partial information based policies - informative path planning and search based motion planning. We validate the approach on a spectrum of environments for both problem domains, including experiments on a real UAV, and show that the learnt policy consistently outperforms state-of-the-art algorithms.
△ Less
Submitted 16 November, 2017;
originally announced November 2017.
-
Learning Heuristic Search via Imitation
Authors:
Mohak Bhardwaj,
Sanjiban Choudhury,
Sebastian Scherer
Abstract:
Robotic motion planning problems are typically solved by constructing a search tree of valid maneuvers from a start to a goal configuration. Limited onboard computation and real-time planning constraints impose a limit on how large this search tree can grow. Heuristics play a crucial role in such situations by guiding the search towards potentially good directions and consequently minimizing searc…
▽ More
Robotic motion planning problems are typically solved by constructing a search tree of valid maneuvers from a start to a goal configuration. Limited onboard computation and real-time planning constraints impose a limit on how large this search tree can grow. Heuristics play a crucial role in such situations by guiding the search towards potentially good directions and consequently minimizing search effort. Moreover, it must infer such directions in an efficient manner using only the information uncovered by the search up until that time. However, state of the art methods do not address the problem of computing a heuristic that explicitly minimizes search effort. In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand. Unfortunately, naively training such policies leads to slow convergence and poor local minima. We present SaIL, an efficient algorithm that trains heuristic policies by imitating "clairvoyant oracles" - oracles that have full information about the world and demonstrate decisions that minimize search effort. We leverage the fact that such oracles can be efficiently computed using dynamic programming and derive performance guarantees for the learnt heuristic. We validate the approach on a spectrum of environments which show that SaIL consistently outperforms state of the art algorithms. Our approach paves the way forward for learning heuristics that demonstrate an anytime nature - finding feasible solutions quickly and incrementally refining it over time.
△ Less
Submitted 10 July, 2017;
originally announced July 2017.
-
Loss of information in feedforward social networks
Authors:
Simon Stolarczyk,
Manisha Bhardwaj,
Kevin E. Bassler,
Wei Ji Ma,
Kresimir Josic
Abstract:
We consider model social networks in which information propagates directionally across layers of rational agents. Each agent makes a locally optimal estimate of the state of the world, and communicates this estimate to agents downstream. When agents receive information from the same source their estimates are correlated. We show that the resulting redundancy can lead to the loss of information abo…
▽ More
We consider model social networks in which information propagates directionally across layers of rational agents. Each agent makes a locally optimal estimate of the state of the world, and communicates this estimate to agents downstream. When agents receive information from the same source their estimates are correlated. We show that the resulting redundancy can lead to the loss of information about the state of the world across layers of the network, even when all agents have full knowledge of the network's structure. A simple algebraic condition identifies networks in which information loss occurs, and we show that all such networks must contain a particular network motif. We also study random networks asymptotically as the number of agents increases, and find a sharp transition in the probability of information loss at the point at which the number of agents in one layer exceeds the number in the previous layer.
△ Less
Submitted 26 September, 2016;
originally announced September 2016.
-
Optimization of stochastic database cracking
Authors:
Meenesh Bhardwaj
Abstract:
Variant Stochastic cracking is a significantly more resilient approach to adaptive indexing. It showed [1]that Stochastic cracking uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decision making. Therefore bring, adaptive indexing forward to a mature formulation…
▽ More
Variant Stochastic cracking is a significantly more resilient approach to adaptive indexing. It showed [1]that Stochastic cracking uses each query as a hint on how to reorganize data, but not blindly so; it gains resilience and avoids performance bottlenecks by deliberately applying certain arbitrary choices in its decision making. Therefore bring, adaptive indexing forward to a mature formulation that confers the workload-robustness that previous approaches lacked. Original cracking relies on the randomness of the workloads to converge well. [2][3] However, where the workload is non-random, cracking needs to introduce randomness on its own. Stochastic Cracking clearly improves over original cracking by being robust in workload changes while maintaining all original cracking features when it comes to adaptation. But looking at both types of cracking, it conveyed an incomplete picture as at some point of time it is must to know whether the workload is random or sequential. In this paper our focus is on optimization of variant stochastic cracking, that could be achieved in two ways either by reducing the initialization cost to make stochastic cracking even more transparent to the user, especially for queries that initiate a workload change and hence incur a higher cost or by combining the strengths of the various stochastic cracking algorithms via a dynamic component that decides which algorithm to choose for a query on the fly. The efforts have been put in to make an algorithm that reduces the initialization cost by using the main notion of both cracking, while considering the requirements of adaptive indexing [2].
△ Less
Submitted 8 May, 2013;
originally announced May 2013.
-
A New Middle Path Approach For Alignements In Blast
Authors:
Deepak Garg,
S C Saxena,
L M Bhardwaj
Abstract:
This paper deals with a new middle path approach developed for reducing alignment calculations in BLAST algorithm. This is a new step which is introduced in BLAST algorithm in between the ungapped and gapped alignments. This step of middle path approach between the ungapped and gapped alignments reduces the number of sequences going for gapped alignment. This results in the improvement in speed fo…
▽ More
This paper deals with a new middle path approach developed for reducing alignment calculations in BLAST algorithm. This is a new step which is introduced in BLAST algorithm in between the ungapped and gapped alignments. This step of middle path approach between the ungapped and gapped alignments reduces the number of sequences going for gapped alignment. This results in the improvement in speed for alignment up to 30 percent.
△ Less
Submitted 27 September, 2012;
originally announced September 2012.