Search | arXiv e-print repository

KARL: Kalman-Filter Assisted Reinforcement Learner for Dynamic Object Tracking and Grasping

Authors: Kowndinya Boyalakuntla, Abdeslam Boularias, Jingjin Yu

Abstract: We present Kalman-filter Assisted Reinforcement Learner (KARL) for dynamic object tracking and grasping over eye-on-hand (EoH) systems, significantly expanding such systems capabilities in challenging, realistic environments. In comparison to the previous state-of-the-art, KARL (1) incorporates a novel six-stage RL curriculum that doubles the system's motion range, thereby greatly enhancing the sy… ▽ More We present Kalman-filter Assisted Reinforcement Learner (KARL) for dynamic object tracking and grasping over eye-on-hand (EoH) systems, significantly expanding such systems capabilities in challenging, realistic environments. In comparison to the previous state-of-the-art, KARL (1) incorporates a novel six-stage RL curriculum that doubles the system's motion range, thereby greatly enhancing the system's grasping performance, (2) integrates a robust Kalman filter layer between the perception and reinforcement learning (RL) control modules, enabling the system to maintain an uncertain but continuous 6D pose estimate even when the target object temporarily exits the camera's field-of-view or undergoes rapid, unpredictable motion, and (3) introduces mechanisms to allow retries to gracefully recover from unavoidable policy execution failures. Extensive evaluations conducted in both simulation and real-world experiments qualitatively and quantitatively corroborate KARL's advantage over earlier systems, achieving higher grasp success rates and faster robot execution speed. Source code and supplementary materials for KARL will be made available at: https://github.com/arc-l/karl. △ Less

Submitted 18 June, 2025; originally announced June 2025.

arXiv:2409.00499 [pdf, other]

DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

Authors: Haonan Chang, Kowndinya Boyalakuntla, Yuhan Liu, Xinyu Zhang, Liam Schramm, Abdeslam Boularias

Abstract: Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same st… ▽ More Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same storage container. We present a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem. DAP leverages a two-step approach, initially identifying a placeable region on the container and then precisely computing the relative pose between the object and that region. Existing methods either struggle with multi-modality issues or computation-intensive training. Our experiments demonstrate DAP's superior performance and training efficiency over the current state-of-the-art RPDiff, achieving remarkable results on the RPDiff benchmark. Additionally, our experiments showcase DAP's data efficiency in real-world applications, an advancement over existing simulation-driven approaches. Our contribution fills a gap in robotic manipulation research by offering a solution that is both computationally efficient and capable of handling real-world variability. Code and supplementary material can be found at: https://github.com/changhaonan/DPS.git. △ Less

Submitted 31 August, 2024; originally announced September 2024.

Comments: Paper Accepted by IROS2024. Arxiv version is 8 pages

arXiv:2309.15940 [pdf, other]

Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

Authors: Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric Jing, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias

Abstract: We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a… ▽ More We present an Open-Vocabulary 3D Scene Graph (OVSG), a formal framework for grounding a variety of entities, such as object instances, agents, and regions, with free-form text-based queries. Unlike conventional semantic-based object localization approaches, our system facilitates context-aware entity localization, allowing for queries such as ``pick up a cup on a kitchen table" or ``navigate to a sofa on which someone is sitting". In contrast to existing research on 3D scene graphs, OVSG supports free-form text input and open-vocabulary querying. Through a series of comparative experiments using the ScanNet dataset and a self-collected dataset, we demonstrate that our proposed approach significantly surpasses the performance of previous semantic-based localization techniques. Moreover, we highlight the practical application of OVSG in real-world robot navigation and manipulation experiments. △ Less

Submitted 27 September, 2023; originally announced September 2023.

Comments: The code and dataset used for evaluation can be found at https://github.com/changhaonan/OVSG}{https://github.com/changhaonan/OVSG. This paper has been accepted by CoRL2023

arXiv:2309.15821 [pdf, other]

LGMCTS: Language-Guided Monte-Carlo Tree Search for Executable Semantic Object Rearrangement

Authors: Haonan Chang, Kai Gao, Kowndinya Boyalakuntla, Alex Lee, Baichuan Huang, Harish Udhaya Kumar, Jinjin Yu, Abdeslam Boularias

Abstract: We introduce a novel approach to the executable semantic object rearrangement problem. In this challenge, a robot seeks to create an actionable plan that rearranges objects within a scene according to a pattern dictated by a natural language description. Unlike existing methods such as StructFormer and StructDiffusion, which tackle the issue in two steps by first generating poses and then leveragi… ▽ More We introduce a novel approach to the executable semantic object rearrangement problem. In this challenge, a robot seeks to create an actionable plan that rearranges objects within a scene according to a pattern dictated by a natural language description. Unlike existing methods such as StructFormer and StructDiffusion, which tackle the issue in two steps by first generating poses and then leveraging a task planner for action plan formulation, our method concurrently addresses pose generation and action planning. We achieve this integration using a Language-Guided Monte-Carlo Tree Search (LGMCTS). Quantitative evaluations are provided on two simulation datasets, and complemented by qualitative tests with a real robot. △ Less

Submitted 7 October, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Our code and supplementary materials are accessible at https://github.com/changhaonan/LG-MCTS

arXiv:2204.03858 [pdf, other]

eGEN: An Energy-saving Modeling Language and Code Generator for Location-sensing of Mobile Apps

Authors: Kowndinya Boyalakuntla, Marimuthu C, Sridhar Chimalakonda, Chandrasekaran K

Abstract: The demand for reducing the energy consumption of location-based applications has increased in recent years. The abnormal battery-draining behavior of GPS makes it difficult for the developers to decide on battery optimization during the development phase directly. It will reduce the burden on developers if battery-saving strategies are considered early, and relevant battery-aware code is generate… ▽ More The demand for reducing the energy consumption of location-based applications has increased in recent years. The abnormal battery-draining behavior of GPS makes it difficult for the developers to decide on battery optimization during the development phase directly. It will reduce the burden on developers if battery-saving strategies are considered early, and relevant battery-aware code is generated from the design phase artifacts. Therefore, we aim to develop tool support, eGEN, to specify and create native location-based mobile apps. eGEN consists of Domain-specific Modeling Language (DSML) and a code generator for location-sensing. It is developed using Xtext and Xtend as an Eclipse plug-in, and currently, it supports native Android apps. eGEN is evaluated through controlled experiments by instrumenting the generated code in five location-based open-source Android applications. The experimental results show 4.35 minutes of average GPS reduction per hour and 188 mA of average reduction in battery consumption while showing only 97 meters degrade in location accuracy over 3 kilometers of a cycling path. Hence, we believe that code generated by eGEN would help developers to balance between energy and accuracy requirements of location-based applications. The source code, documentation, tool demo video, and tool installation video are available at https://github.com/Kowndinya2000/egen. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: 27 pages, 7 figures, 6 tables

arXiv:2107.06799 [pdf, other]

WAccess -- A Web Accessibility Tool based on WCAG 2.2, 2.1 and 2.0 Guidelines

Authors: Kowndinya Boyalakuntla, Akhila Sri Manasa Venigalla, Sridhar Chimalakonda

Abstract: The vision of providing access to all web content equally for all users makes web accessibility a fundamental goal of today's internet. Web accessibility is the practice of removing barriers from websites that could hinder functionality for users with various disabilities. Web accessibility is measured against the accessibility guidelines such as WCAG, GIGW, and so on. WCAG 2.2 is the latest set o… ▽ More The vision of providing access to all web content equally for all users makes web accessibility a fundamental goal of today's internet. Web accessibility is the practice of removing barriers from websites that could hinder functionality for users with various disabilities. Web accessibility is measured against the accessibility guidelines such as WCAG, GIGW, and so on. WCAG 2.2 is the latest set of guidelines for web accessibility that helps in making websites accessible. The web accessibility tools available in the World Wide Web Consortium (W3C), only conform up to WCAG 2.1 guidelines, while no tools exist for the latest set of guidelines. Despite the availability of several tools to check the conformity of websites with WCAG 2.1 guidelines, there is a scarcity of tools that are both open source and scalable. To support automated accessibility evaluation of numerous websites against WCAG 2.2, 2.1, and 2.0 we present a tool, WAccess. WAccess highlights violations of 13 guidelines from WCAG 2.0, 9 guidelines from WCAG 2.1, and 7 guidelines from WCAG 2.2 of a specific web page on the web console and suggests the fix for violations while specifying violating code snippet simultaneously. We evaluated WAccess against 2227 government websites of India and observed a total of about 6.1 million violations. △ Less

Submitted 20 September, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

Comments: 17 pages, 7 figures, 4 tables

Showing 1–6 of 6 results for author: Boyalakuntla, K