-
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
Authors:
Aniket Rege,
Zinnia Nie,
Mahesh Ramesh,
Unmesh Raskar,
Zhuoran Yu,
Aditya Kusupati,
Yong Jae Lee,
Ramya Korlakai Vinayak
Abstract:
Popular text-to-image (T2I) systems are trained on web-scraped data, which is heavily Amero and Euro-centric, underrepresenting the cultures of the Global South. To analyze these biases, we introduce CuRe, a novel and scalable benchmarking and scoring suite for cultural representativeness that leverages the marginal utility of attribute specification to T2I systems as a proxy for human judgments.…
▽ More
Popular text-to-image (T2I) systems are trained on web-scraped data, which is heavily Amero and Euro-centric, underrepresenting the cultures of the Global South. To analyze these biases, we introduce CuRe, a novel and scalable benchmarking and scoring suite for cultural representativeness that leverages the marginal utility of attribute specification to T2I systems as a proxy for human judgments. Our CuRe benchmark dataset has a novel categorical hierarchy built from the crowdsourced Wikimedia knowledge graph, with 300 cultural artifacts across 32 cultural subcategories grouped into six broad cultural axes (food, art, fashion, architecture, celebrations, and people). Our dataset's categorical hierarchy enables CuRe scorers to evaluate T2I systems by analyzing their response to increasing the informativeness of text conditioning, enabling fine-grained cultural comparisons. We empirically observe much stronger correlations of our class of scorers to human judgments of perceptual similarity, image-text alignment, and cultural diversity across image encoders (SigLIP 2, AIMV2 and DINOv2), vision-language models (OpenCLIP, SigLIP 2, Gemini 2.0 Flash) and state-of-the-art text-to-image systems, including three variants of Stable Diffusion (1.5, XL, 3.5 Large), FLUX.1 [dev], Ideogram 2.0, and DALL-E 3. The code and dataset is open-sourced and available at https://aniketrege.github.io/cure/.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Authors:
Ninghan Zhong,
Steven Caro,
Avraiem Iskandar,
Megnath Ramesh,
Stephen L. Smith
Abstract:
Mobile robots are increasingly deployed in unstructured environments where obstacles and objects are movable. Navigation in such environments is known as interactive navigation, where task completion requires not only avoiding obstacles but also strategic interactions with movable objects. Non-prehensile interactive navigation focuses on non-grasping interaction strategies, such as pushing, rather…
▽ More
Mobile robots are increasingly deployed in unstructured environments where obstacles and objects are movable. Navigation in such environments is known as interactive navigation, where task completion requires not only avoiding obstacles but also strategic interactions with movable objects. Non-prehensile interactive navigation focuses on non-grasping interaction strategies, such as pushing, rather than relying on prehensile manipulation. Despite a growing body of research in this field, most solutions are evaluated using case-specific setups, limiting reproducibility and cross-comparison. In this paper, we present Bench-NPIN, the first comprehensive benchmark for non-prehensile interactive navigation. Bench-NPIN includes multiple components: 1) a comprehensive range of simulated environments for non-prehensile interactive navigation tasks, including navigating a maze with movable obstacles, autonomous ship navigation in icy waters, box delivery, and area clearing, each with varying levels of complexity; 2) a set of evaluation metrics that capture unique aspects of interactive navigation, such as efficiency, interaction effort, and partial task completion; and 3) demonstrations using Bench-NPIN to evaluate example implementations of established baselines across environments. Bench-NPIN is an open-source Python library with a modular design. The code, documentation, and trained models can be found at https://github.com/IvanIZ/BenchNPIN.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Integrating Model-based Control and RL for Sim2Real Transfer of Tight Insertion Policies
Authors:
Isidoros Marougkas,
Dhruv Metha Ramesh,
Joe H. Doerr,
Edgar Granados,
Aravind Sivaramakrishnan,
Abdeslam Boularias,
Kostas E. Bekris
Abstract:
Object insertion under tight tolerances ($< \hspace{-.02in} 1mm$) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control wit…
▽ More
Object insertion under tight tolerances ($< \hspace{-.02in} 1mm$) is an important but challenging assembly task as even small errors can result in undesirable contacts. Recent efforts focused on Reinforcement Learning (RL), which often depends on careful definition of dense reward functions. This work proposes an effective strategy for such tasks that integrates traditional model-based control with RL to achieve improved insertion accuracy. The policy is trained exclusively in simulation and is zero-shot transferred to the real system. It employs a potential field-based controller to acquire a model-based policy for inserting a plug into a socket given full observability in simulation. This policy is then integrated with residual RL, which is trained in simulation given only a sparse, goal-reaching reward. A curriculum scheme over observation noise and action magnitude is used for training the residual RL policy. Both policy components use as input the SE(3) poses of both the plug and the socket and return the plug's SE(3) pose transform, which is executed by a robotic arm using a controller. The integrated policy is deployed on the real system without further training or fine-tuning, given a visual SE(3) object tracker. The proposed solution and alternatives are evaluated across a variety of objects and conditions in simulation and reality. The proposed approach outperforms recent RL-based methods in this domain and prior efforts with hybrid policies. Ablations highlight the impact of each component of the approach.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
PROBE: Proprioceptive Obstacle Detection and Estimation while Navigating in Clutter
Authors:
Dhruv Metha Ramesh,
Aravind Sivaramakrishnan,
Shreesh Keskar,
Kostas E. Bekris,
Jingjin Yu,
Abdeslam Boularias
Abstract:
In critical applications, including search-and-rescue in degraded environments, blockages can be prevalent and prevent the effective deployment of certain sensing modalities, particularly vision, due to occlusion and the constrained range of view of onboard camera sensors. To enable robots to tackle these challenges, we propose a new approach, Proprioceptive Obstacle Detection and Estimation while…
▽ More
In critical applications, including search-and-rescue in degraded environments, blockages can be prevalent and prevent the effective deployment of certain sensing modalities, particularly vision, due to occlusion and the constrained range of view of onboard camera sensors. To enable robots to tackle these challenges, we propose a new approach, Proprioceptive Obstacle Detection and Estimation while navigating in clutter PROBE, which instead relies only on the robot's proprioception to infer the presence or absence of occluded rectangular obstacles while predicting their dimensions and poses in SE(2). The proposed approach is a Transformer neural network that receives as input a history of applied torques and sensed whole-body movements of the robot and returns a parameterized representation of the obstacles in the environment. The effectiveness of PROBE is evaluated on simulated environments in Isaac Gym and with a real Unitree Go1 quadruped robot.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Visuotactile-Based Learning for Insertion with Compliant Hands
Authors:
Osher Azulay,
Dhruv Metha Ramesh,
Nimrod Curtis,
Avishai Sintov
Abstract:
Compared to rigid hands, underactuated compliant hands offer greater adaptability to object shapes, provide stable grasps, and are often more cost-effective. However, they introduce uncertainties in hand-object interactions due to their inherent compliance and lack of precise finger proprioception as in rigid hands. These limitations become particularly significant when performing contact-rich tas…
▽ More
Compared to rigid hands, underactuated compliant hands offer greater adaptability to object shapes, provide stable grasps, and are often more cost-effective. However, they introduce uncertainties in hand-object interactions due to their inherent compliance and lack of precise finger proprioception as in rigid hands. These limitations become particularly significant when performing contact-rich tasks like insertion. To address these challenges, additional sensing modalities are required to enable robust insertion capabilities. This letter explores the essential sensing requirements for successful insertion tasks with compliant hands, focusing on the role of visuotactile perception (i.e., visual and tactile perception). We propose a simulation-based multimodal policy learning framework that leverages all-around tactile sensing and an extrinsic depth camera. A transformer-based policy, trained through a teacher-student distillation process, is successfully transferred to a real-world robotic system without further training. Our results emphasize the crucial role of tactile sensing in conjunction with visual perception for accurate object-socket pose estimation, successful sim-to-real transfer and robust task execution.
△ Less
Submitted 3 March, 2025; v1 submitted 10 November, 2024;
originally announced November 2024.
-
${\tt KRAFT}$: Sampling-Based Kinodynamic Replanning and Feedback Control over Approximate, Identified Models of Vehicular Systems
Authors:
Aravind Sivaramakrishnan,
Sumanth Tangirala,
Dhruv Metha Ramesh,
Edgar Granados,
Kostas E. Bekris
Abstract:
This paper aims to increase the safety and reliability of executing trajectories planned for robots with non-trivial dynamics given a light-weight, approximate dynamics model. Scenarios include mobile robots navigating through workspaces with imperfectly modeled surfaces and unknown friction. The proposed approach, Kinodynamic Replanning over Approximate Models with Feedback Tracking (KRAFT), inte…
▽ More
This paper aims to increase the safety and reliability of executing trajectories planned for robots with non-trivial dynamics given a light-weight, approximate dynamics model. Scenarios include mobile robots navigating through workspaces with imperfectly modeled surfaces and unknown friction. The proposed approach, Kinodynamic Replanning over Approximate Models with Feedback Tracking (KRAFT), integrates: (i) replanning via an asymptotically optimal sampling-based kinodynamic tree planner, with (ii) trajectory following via feedback control, and (iii) a safety mechanism to reduce collision due to second-order dynamics. The planning and control components use a rough dynamics model expressed analytically via differential equations, which is tuned via system identification (SysId) in a training environment but not the deployed one. This allows the process to be fast and achieve long-horizon reasoning during each replanning cycle. At the same time, the model still includes gaps with reality, even after SysID, in new environments. Experiments demonstrate the limitations of kinematic path planning and path tracking approaches, highlighting the importance of: (a) closing the feedback-loop also at the planning level; and (b) long-horizon reasoning, for safe and efficient trajectory execution given inaccurate models.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
Approximate Environment Decompositions for Robot Coverage Planning using Submodular Set Cover
Authors:
Megnath Ramesh,
Frank Imeson,
Baris Fidan,
Stephen L. Smith
Abstract:
In this paper, we investigate the problem of decomposing 2D environments for robot coverage planning. Coverage path planning (CPP) involves computing a cost-minimizing path for a robot equipped with a coverage or sensing tool so that the tool visits all points in the environment. CPP is an NP-Hard problem, so existing approaches simplify the problem by decomposing the environment into the minimum…
▽ More
In this paper, we investigate the problem of decomposing 2D environments for robot coverage planning. Coverage path planning (CPP) involves computing a cost-minimizing path for a robot equipped with a coverage or sensing tool so that the tool visits all points in the environment. CPP is an NP-Hard problem, so existing approaches simplify the problem by decomposing the environment into the minimum number of sectors. Sectors are sub-regions of the environment that can each be covered using a lawnmower path (i.e., along parallel straight-line paths) oriented at an angle. However, traditional methods either limit the coverage orientations to be axis-parallel (horizontal/vertical) or provide no guarantees on the number of sectors in the decomposition. We introduce an approach to decompose the environment into possibly overlapping rectangular sectors. We provide an approximation guarantee on the number of sectors computed using our approach for a given environment. We do this by leveraging the submodular property of the sector coverage function, which enables us to formulate the decomposition problem as a submodular set cover (SSC) problem with well-known approximation guarantees for the greedy algorithm. Our approach improves upon existing coverage planning methods, as demonstrated through an evaluation using maps of complex real-world environments.
△ Less
Submitted 4 September, 2024;
originally announced September 2024.
-
MABViT -- Modified Attention Block Enhances Vision Transformers
Authors:
Mahesh Ramesh,
Aswinkumar Ramkumar
Abstract:
Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in enhancing transformer models, particularly in Large Language Models (LLMs). Additionally, utilizing a parallel configuration within each Transformer block rather than the conventional serialized method has been revealed to accelerate the training of LLMs without significantly impacting performance. However, when the…
▽ More
Recent studies have demonstrated the effectiveness of Gated Linear Units (GLU) in enhancing transformer models, particularly in Large Language Models (LLMs). Additionally, utilizing a parallel configuration within each Transformer block rather than the conventional serialized method has been revealed to accelerate the training of LLMs without significantly impacting performance. However, when the MLP and attention block were run in parallel for the image classification task, we observed a noticeable decline in performance. We propose a novel transformer variant that integrates non-linearity within the attention block to tackle this problem. We implemented the GLU-based activation function on the Value tensor, and this new technique surpasses the current state-of-the-art S/16 variant of Vision Transformers by 0.6% on the ImageNet-1K dataset while utilizing fewer parameters. It also supersedes the B/16 variant while using only half the parameters. Furthermore, we provide results with the GELU activation function variant to confirm our assertions. Lastly, we showcase that the MABViT variants exhibit greater potential when utilized in deep transformers compared to the standard architecture.
△ Less
Submitted 1 January, 2024; v1 submitted 3 December, 2023;
originally announced December 2023.
-
Anytime Replanning of Robot Coverage Paths for Partially Unknown Environments
Authors:
Megnath Ramesh,
Frank Imeson,
Baris Fidan,
Stephen L. Smith
Abstract:
In this paper, we propose a method to replan coverage paths for a robot operating in an environment with initially unknown static obstacles. Existing coverage approaches reduce coverage time by covering along the minimum number of coverage lines (straight-line paths). However, recomputing such paths online can be computationally expensive resulting in robot stoppages that increase coverage time. A…
▽ More
In this paper, we propose a method to replan coverage paths for a robot operating in an environment with initially unknown static obstacles. Existing coverage approaches reduce coverage time by covering along the minimum number of coverage lines (straight-line paths). However, recomputing such paths online can be computationally expensive resulting in robot stoppages that increase coverage time. A naive alternative is greedy detour replanning, i.e., replanning with minimum deviation from the initial path, which is efficient to compute but may result in unnecessary detours. In this work, we propose an anytime coverage replanning approach named OARP-Replan that performs near-optimal replans to an interrupted coverage path within a given time budget. We do this by solving linear relaxations of integer linear programs (ILPs) to identify sections of the interrupted path that can be optimally replanned within the time budget. We validate OARP-Replan in simulation and perform comparisons against a greedy detour replanner and other state-of-the-art coverage planners. We also demonstrate OARP-Replan in experiments using an industrial-level autonomous robot.
△ Less
Submitted 7 June, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
Mono-STAR: Mono-camera Scene-level Tracking and Reconstruction
Authors:
Haonan Chang,
Dhruv Metha Ramesh,
Shijie Geng,
Yuqiu Gan,
Abdeslam Boularias
Abstract:
We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework. The proposed system solves a new optimization problem incorporating optical-flow-based 2D constraints to deal with fast motion and a novel semantic-aware deformation graph (SAD-graph) f…
▽ More
We present Mono-STAR, the first real-time 3D reconstruction system that simultaneously supports semantic fusion, fast motion tracking, non-rigid object deformation, and topological change under a unified framework. The proposed system solves a new optimization problem incorporating optical-flow-based 2D constraints to deal with fast motion and a novel semantic-aware deformation graph (SAD-graph) for handling topology change. We test the proposed system under various challenging scenes and demonstrate that it significantly outperforms existing state-of-the-art methods.
△ Less
Submitted 30 January, 2023;
originally announced January 2023.
-
Optimal Partitioning of Non-Convex Environments for Minimum Turn Coverage Planning
Authors:
Megnath Ramesh,
Frank Imeson,
Baris Fidan,
Stephen L. Smith
Abstract:
In this paper, we tackle the problem of planning an optimal coverage path for a robot operating indoors. Many existing approaches attempt to discourage turns in the path by covering the environment along the least number of coverage lines, i.e., straight-line paths. This is because turning not only slows down the robot but also negatively affects the quality of coverage, e.g., tools like cameras a…
▽ More
In this paper, we tackle the problem of planning an optimal coverage path for a robot operating indoors. Many existing approaches attempt to discourage turns in the path by covering the environment along the least number of coverage lines, i.e., straight-line paths. This is because turning not only slows down the robot but also negatively affects the quality of coverage, e.g., tools like cameras and cleaning attachments commonly have poor performance around turns. The problem of minimizing coverage lines however is typically solved using heuristics that do not guarantee optimality. In this work, we propose a turn-minimizing coverage planning method that computes the optimal number of axis-parallel (horizontal/vertical) coverage lines for the environment in polynomial time. We do this by formulating a linear program (LP) that optimally partitions the environment into axis-parallel ranks (non-intersecting rectangles of width equal to the tool width). We then generate coverage paths for a set of real-world indoor environments and compare the results with state-of-the-art coverage approaches.
△ Less
Submitted 26 May, 2022; v1 submitted 16 September, 2021;
originally announced September 2021.
-
To Block or Not to Block: Accelerating Mobile Web Pages On-The-Fly Through JavaScript Classification
Authors:
Moumena Chaqfeh,
Muhammad Haseeb,
Waleed Hashmi,
Patrick Inshuti,
Manesha Ramesh,
Matteo Varvello,
Fareed Zaffar,
Lakshmi Subramanian,
Yasir Zaki
Abstract:
The increasing complexity of JavaScript in modern mobile web pages has become a critical performance bottleneck for low-end mobile phone users, especially in developing regions. In this paper, we propose SlimWeb, a novel approach that automatically derives lightweight versions of mobile web pages on-the-fly by eliminating the use of unnecessary JavaScript. SlimWeb consists of a JavaScript classifi…
▽ More
The increasing complexity of JavaScript in modern mobile web pages has become a critical performance bottleneck for low-end mobile phone users, especially in developing regions. In this paper, we propose SlimWeb, a novel approach that automatically derives lightweight versions of mobile web pages on-the-fly by eliminating the use of unnecessary JavaScript. SlimWeb consists of a JavaScript classification service powered by a supervised Machine Learning (ML) model that provides insights into each JavaScript element embedded in a web page. SlimWeb aims to improve the web browsing experience by predicting the class of each element, such that essential elements are preserved and non-essential elements are blocked by the browsers using the service. We motivate the core design of SlimWeb using a user preference survey of 306 users and perform a detailed evaluation of SlimWeb across 500 popular web pages in a developing region on real 3G and 4G cellular networks, along with a user experience study with 20 real-world users and a usage willingness survey of 588 users. Evaluation results show that SlimWeb achieves a 50% reduction in the page load time compared to the original pages, and more than 30% reduction compared to competing solutions, while achieving high similarity scores to the original pages measured via a qualitative evaluation study of 62 users. SlimWeb improves the overall user experience by more than 60% compared to the original pages, while maintaining 90%-100% of the visual and functional components of most pages. Finally, the SlimWeb classifier achieves a median accuracy of 90% in predicting the JavaScript category.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
Medical Image Compression using Wavelet Decomposition for Prediction Method
Authors:
S. M. Ramesh,
A. Shanmugam
Abstract:
In this paper offers a simple and lossless compression method for compression of medical images. Method is based on wavelet decomposition of the medical images followed by the correlation analysis of coefficients. The correlation analyses are the basis of prediction equation for each sub band. Predictor variable selection is performed through coefficient graphic method to avoid multicollinearity…
▽ More
In this paper offers a simple and lossless compression method for compression of medical images. Method is based on wavelet decomposition of the medical images followed by the correlation analysis of coefficients. The correlation analyses are the basis of prediction equation for each sub band. Predictor variable selection is performed through coefficient graphic method to avoid multicollinearity problem and to achieve high prediction accuracy and compression rate. The method is applied on MRI and CT images. Results show that the proposed approach gives a high compression rate for MRI and CT images comparing with state of the art methods.
△ Less
Submitted 11 February, 2010;
originally announced February 2010.