Search | arXiv e-print repository

BINGO! Simple Optimizers Win Big if Problems Collapse to a Few Buckets

Authors: Kishan Kumar Ganguly, Tim Menzies

Abstract: Traditional multi-objective optimization in software engineering (SE) can be slow and complex. This paper introduces the BINGO effect: a novel phenomenon where SE data surprisingly collapses into a tiny fraction of possible solution "buckets" (e.g., only 100 used from 4,096 expected). We show the BINGO effect's prevalence across 39 optimization in SE problems. Exploiting this, we optimize 10,000… ▽ More Traditional multi-objective optimization in software engineering (SE) can be slow and complex. This paper introduces the BINGO effect: a novel phenomenon where SE data surprisingly collapses into a tiny fraction of possible solution "buckets" (e.g., only 100 used from 4,096 expected). We show the BINGO effect's prevalence across 39 optimization in SE problems. Exploiting this, we optimize 10,000 times faster than state-of-the-art methods, with comparable effectiveness. Our new algorithms (LITE and LINE), demonstrate that simple stochastic selection can match complex optimizers like DEHB. This work explains why simple methods succeed in SE-real data occupies a small corner of possibilities-and guides when to apply them, challenging the need for CPU-heavy optimization. Our data and code are public at GitHub (see anon-artifacts/bingo). △ Less

Submitted 4 June, 2025; originally announced June 2025.

arXiv:2504.01202 [pdf, other]

Global explainability of a deep abstaining classifier

Authors: Sayera Dhaubhadel, Jamaludin Mohd-Yusof, Benjamin H. McMahon, Trilce Estrada, Kumkum Ganguly, Adam Spannaus, John P. Gounley, Xiao-Cheng Wu, Eric B. Durbin, Heidi A. Hanson, Tanmoy Bhattacharya

Abstract: We present a global explainability method to characterize sources of errors in the histology prediction task of our real-world multitask convolutional neural network (MTCNN)-based deep abstaining classifier (DAC), for automated annotation of cancer pathology reports from NCI-SEER registries. Our classifier was trained and evaluated on 1.04 million hand-annotated samples and makes simultaneous pred… ▽ More We present a global explainability method to characterize sources of errors in the histology prediction task of our real-world multitask convolutional neural network (MTCNN)-based deep abstaining classifier (DAC), for automated annotation of cancer pathology reports from NCI-SEER registries. Our classifier was trained and evaluated on 1.04 million hand-annotated samples and makes simultaneous predictions of cancer site, subsite, histology, laterality, and behavior for each report. The DAC framework enables the model to abstain on ambiguous reports and/or confusing classes to achieve a target accuracy on the retained (non-abstained) samples, but at the cost of decreased coverage. Requiring 97% accuracy on the histology task caused our model to retain only 22% of all samples, mostly the less ambiguous and common classes. Local explainability with the GradInp technique provided a computationally efficient way of obtaining contextual reasoning for thousands of individual predictions. Our method, involving dimensionality reduction of approximately 13000 aggregated local explanations, enabled global identification of sources of errors as hierarchical complexity among classes, label noise, insufficient information, and conflicting evidence. This suggests several strategies such as exclusion criteria, focused annotation, and reduced penalties for errors involving hierarchically related classes to iteratively improve our DAC in this complex real-world implementation. △ Less

Submitted 1 April, 2025; originally announced April 2025.

arXiv:2404.03686 [pdf]

Securing Social Spaces: Harnessing Deep Learning to Eradicate Cyberbullying

Authors: Rohan Biswas, Kasturi Ganguly, Arijit Das, Diganta Saha

Abstract: In today's digital world, cyberbullying is a serious problem that can harm the mental and physical health of people who use social media. This paper explains just how serious cyberbullying is and how it really affects indi-viduals exposed to it. It also stresses how important it is to find better ways to detect cyberbullying so that online spaces can be safer. Plus, it talks about how making more… ▽ More In today's digital world, cyberbullying is a serious problem that can harm the mental and physical health of people who use social media. This paper explains just how serious cyberbullying is and how it really affects indi-viduals exposed to it. It also stresses how important it is to find better ways to detect cyberbullying so that online spaces can be safer. Plus, it talks about how making more accurate tools to spot cyberbullying will be really helpful in the future. Our paper introduces a deep learning-based ap-proach, primarily employing BERT and BiLSTM architectures, to effective-ly address cyberbullying. This approach is designed to analyse large vol-umes of posts and predict potential instances of cyberbullying in online spaces. Our results demonstrate the superiority of the hateBERT model, an extension of BERT focused on hate speech detection, among the five mod-els, achieving an accuracy rate of 89.16%. This research is a significant con-tribution to "Computational Intelligence for Social Transformation," prom-ising a safer and more inclusive digital landscape. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2308.03780 [pdf]

Exploring IoT for real-time CO2 monitoring and analysis

Authors: Abhiroop Sarkar, Debayan Ghosh, Kinshuk Ganguly, Snehal Ghosh, Subhajit Saha

Abstract: As a part of this project, we have developed an IoT-based instrument utilizing the NODE MCU-ESP8266 module, MQ135 gas sensor, and DHT-11 sensor for measuring CO$_2$ levels in parts per million (ppm), temperature, and humidity. The escalating CO$_2$ levels worldwide necessitate constant monitoring and analysis to comprehend the implications for human health, safety, energy efficiency, and environme… ▽ More As a part of this project, we have developed an IoT-based instrument utilizing the NODE MCU-ESP8266 module, MQ135 gas sensor, and DHT-11 sensor for measuring CO$_2$ levels in parts per million (ppm), temperature, and humidity. The escalating CO$_2$ levels worldwide necessitate constant monitoring and analysis to comprehend the implications for human health, safety, energy efficiency, and environmental well-being. Thus, an efficient and cost-effective solution is imperative to measure and transmit data for statistical analysis and storage. The instrument offers real-time monitoring, enabling a comprehensive understanding of indoor environmental conditions. By providing valuable insights, it facilitates the implementation of measures to ensure health and safety, optimize energy efficiency, and promote effective environmental monitoring. This scientific endeavor aims to contribute to the growing body of knowledge surrounding CO$_2$ levels, temperature, and humidity, fostering sustainable practices and informed decision-making △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 9 pages, 7 figures

ACM Class: C.2.6; J.7

arXiv:2305.08120 [pdf]

doi 10.13140/RG.2.2.29687.19365

Unraveling Cold Start Enigmas in Predictive Analytics for OTT Media: Synergistic Meta-Insights and Multimodal Ensemble Mastery

Authors: K. Ganguly, A. Patra

Abstract: The cold start problem is a common challenge in various domains, including media use cases such as predicting viewership for newly launched shows on Over-The-Top (OTT) platforms. In this study, we propose a generic approach to tackle cold start problems by leveraging metadata and employing multi-model ensemble techniques. Our methodology includes feature engineering, model selection, and an ensemb… ▽ More The cold start problem is a common challenge in various domains, including media use cases such as predicting viewership for newly launched shows on Over-The-Top (OTT) platforms. In this study, we propose a generic approach to tackle cold start problems by leveraging metadata and employing multi-model ensemble techniques. Our methodology includes feature engineering, model selection, and an ensemble approach based on a weighted average of predictions. The performance of our proposed method is evaluated using various performance metrics. Our results indicate that the multi-model ensemble approach significantly improves prediction accuracy compared to individual models. △ Less

Submitted 14 May, 2023; originally announced May 2023.

arXiv:2203.07290 [pdf, other]

GradTac: Spatio-Temporal Gradient Based Tactile Sensing

Authors: Kanishka Ganguly, Pavan Mantripragada, Chethan M. Parameshwara, Cornelia Fermüller, Nitin J. Sanket, Yiannis Aloimonos

Abstract: Tactile sensing for robotics is achieved through a variety of mechanisms, including magnetic, optical-tactile, and conductive fluid. Currently, the fluid-based sensors have struck the right balance of anthropomorphic sizes and shapes and accuracy of tactile response measurement. However, this design is plagued by a low Signal to Noise Ratio (SNR) due to the fluid based sensing mechanism "damping"… ▽ More Tactile sensing for robotics is achieved through a variety of mechanisms, including magnetic, optical-tactile, and conductive fluid. Currently, the fluid-based sensors have struck the right balance of anthropomorphic sizes and shapes and accuracy of tactile response measurement. However, this design is plagued by a low Signal to Noise Ratio (SNR) due to the fluid based sensing mechanism "damping" the measurement values that are hard to model. To this end, we present a spatio-temporal gradient representation on the data obtained from fluid-based tactile sensors, which is inspired from neuromorphic principles of event based sensing. We present a novel algorithm (GradTac) that converts discrete data points from spatial tactile sensors into spatio-temporal surfaces and tracks tactile contours across these surfaces. Processing the tactile data using the proposed spatio-temporal domain is robust, makes it less susceptible to the inherent noise from the fluid based sensors, and allows accurate tracking of regions of touch as compared to using the raw data. We successfully evaluate and demonstrate the efficacy of GradTac on many real-world experiments performed using the Shadow Dexterous Hand, equipped with the BioTac SP sensors. Specifically, we use it for tracking tactile input across the sensor's surface, measuring relative forces, detecting linear and rotational slip, and for edge tracking. We also release an accompanying task-agnostic dataset for the BioTac SP, which we hope will provide a resource to compare and quantify various novel approaches, and motivate further research. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 12 pages, 12 figures, 1 table Submitted to Frontiers in Robotics and AI under Multisensory Perception and Learning towards Dexterous Robot Manipulation and Interaction

arXiv:2203.02072 [pdf, other]

X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback

Authors: Jensen Gao, Siddharth Reddy, Glen Berseth, Nicholas Hardy, Nikhilesh Natraj, Karunesh Ganguly, Anca D. Dragan, Sergey Levine

Abstract: We aim to help users communicate their intent to machines using flexible, adaptive interfaces that translate arbitrary user input into desired actions. In this work, we focus on assistive typing applications in which a user cannot operate a keyboard, but can instead supply other inputs, such as webcam images that capture eye gaze or neural activity measured by a brain implant. Standard methods tra… ▽ More We aim to help users communicate their intent to machines using flexible, adaptive interfaces that translate arbitrary user input into desired actions. In this work, we focus on assistive typing applications in which a user cannot operate a keyboard, but can instead supply other inputs, such as webcam images that capture eye gaze or neural activity measured by a brain implant. Standard methods train a model on a fixed dataset of user inputs, then deploy a static interface that does not learn from its mistakes; in part, because extracting an error signal from user behavior can be challenging. We investigate a simple idea that would enable such interfaces to improve over time, with minimal additional effort from the user: online learning from user feedback on the accuracy of the interface's actions. In the typing domain, we leverage backspaces as feedback that the interface did not perform the desired action. We propose an algorithm called x-to-text (X2T) that trains a predictive model of this feedback signal, and uses this model to fine-tune any existing, default interface for translating user input into actions that select words or characters. We evaluate X2T through a small-scale online user study with 12 participants who type sentences by gazing at their desired words, a large-scale observational study on handwriting samples from 60 users, and a pilot study with one participant using an electrocorticography-based brain-computer interface. The results show that X2T learns to outperform a non-adaptive default interface, stimulates user co-adaptation to the interface, personalizes the interface to individual users, and can leverage offline data collected from the default interface to improve its initial performance and accelerate online learning. △ Less

Submitted 6 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: Accepted to International Conference on Learning Representations (ICLR) 2021

arXiv:2012.00501 [pdf]

A Statistical Real-Time Prediction Model for Recommender System

Authors: Md Rifat Arefin, Minhas Kamal, Kishan Kumar Ganguly, Tarek Salah Uddin Mahmud

Abstract: Recommender system has become an inseparable part of online shopping and its usability is increasing with the advancement of these e-commerce sites. An effective and efficient recommender system benefits both the seller and the buyer significantly. We considered user activities and product information for the filtering process in our proposed recommender system. Our model has achieved inspiring re… ▽ More Recommender system has become an inseparable part of online shopping and its usability is increasing with the advancement of these e-commerce sites. An effective and efficient recommender system benefits both the seller and the buyer significantly. We considered user activities and product information for the filtering process in our proposed recommender system. Our model has achieved inspiring result (approximately 58% true-positive and 13% false-positive) for the data set provided by RecSys Challenge 2015. This paper aims to describe a statistical model that will help to predict the buying behavior of a user in real-time during a session. △ Less

Submitted 1 December, 2020; originally announced December 2020.

arXiv:2011.00712 [pdf, other]

Grasping in the Dark: Zero-Shot Object Grasping Using Tactile Feedback

Authors: Kanishka Ganguly, Behzad Sadrfaridpour, Pavan Mantripragada, Nitin J. Sanket, Cornelia Fermüller, Yiannis Aloimonos

Abstract: Grasping and manipulating a wide variety of objects is a fundamental skill that would determine the success and wide spread adaptation of robots in homes. Several end-effector designs for robust manipulation have been proposed but they mostly work when provided with prior information about the objects or equipped with external sensors for estimating object shape or size. Such approaches are limite… ▽ More Grasping and manipulating a wide variety of objects is a fundamental skill that would determine the success and wide spread adaptation of robots in homes. Several end-effector designs for robust manipulation have been proposed but they mostly work when provided with prior information about the objects or equipped with external sensors for estimating object shape or size. Such approaches are limited to many-shot or unknown objects and are prone to estimation errors from external estimation systems. We propose an approach to grasp and manipulate previously unseen or zero-shot objects: the objects without any prior of their shape, size, material and weight properties, using only feedback from tactile sensors which is contrary to the state-of-the-art. Such an approach provides robust manipulation of objects either when the object model is not known or when it is estimated incorrectly from an external system. Our approach is inspired by the ideology of how animals or humans manipulate objects, i.e., by using feedback from their skin. Our grasping and manipulation revolves around the simple notion that objects slip if not grasped stably. This slippage can be detected and counteracted for a robust grasp that is agnostic to the type, shape, size, material and weight of the object. At the crux of our approach is a novel tactile feedback based controller that detects and compensates for slip during grasp. We successfully evaluate and demonstrate our proposed approach on many real world experiments using the Shadow Dexterous Hand equipped with BioTac SP tactile sensors for different object shapes, sizes, weights and materials. We obtain an overall success rate of 73.5% △ Less

Submitted 16 September, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

Comments: 6 pages, 1 page references, 8 figures, 2 tables. Under review

arXiv:2009.05094 [pdf, other]

Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports

Authors: Sayera Dhaubhadel, Jamaludin Mohd-Yusof, Kumkum Ganguly, Gopinath Chennupati, Sunil Thulasidasan, Nicolas W. Hengartner, Brent J. Mumphrey, Eric B. Durbin, Jennifer A. Doherty, Mireille Lemieux, Noah Schaefferkoetter, Georgia Tourassi, Linda Coyle, Lynne Penberthy, Benjamin H. McMahon, Tanmoy Bhattacharya

Abstract: Safe deployment of deep learning systems in critical real world applications requires models to make very few mistakes, and only under predictable circumstances. In this work, we address this problem using an abstaining classifier that is tuned to have $>$95% accuracy, and then identify the determinants of abstention using LIME. Essentially, we are training our model to learn the attributes of pat… ▽ More Safe deployment of deep learning systems in critical real world applications requires models to make very few mistakes, and only under predictable circumstances. In this work, we address this problem using an abstaining classifier that is tuned to have $>$95% accuracy, and then identify the determinants of abstention using LIME. Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate an abstaining classifier in a multitask setting for classifying cancer pathology reports from the NCI SEER cancer registries on six tasks of interest. For these tasks, we reduce the classification error rate by factors of 2--5 by abstaining on 25--45% of the reports. For the specific task of classifying cancer site, we are able to identify metastasis, reports involving lymph nodes, and discussion of multiple cancer sites as responsible for many of the classification mistakes, and observe that the extent and types of mistakes vary systematically with cancer site (e.g., breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks\edit, and greater than 85% for all six tasks on the retained samples. Furthermore, we show that LIME provides a better determinant of classification than measures of word occurrence alone. By combining a deep abstaining classifier with feature identification using LIME, we are able to identify concepts responsible for both correctness and abstention when classifying cancer sites from pathology reports. The improvement of LIME over keyword searches is statistically significant, presumably because words are assessed in context and have been identified as a local determinant of classification. △ Less

Submitted 21 April, 2022; v1 submitted 10 September, 2020; originally announced September 2020.

arXiv:2008.11636 [pdf, other]

Impact on the Productivity of Remotely Working IT Professionals of Bangladesh during the Coronavirus Disease 2019

Authors: Kishan Kumar Ganguly, Noshin Tahsin, Mridha Md. Nafis Fuad, Toukir Ahammed, Moumita Asad, Syed Fatiul Huq, A. T. M. Fazlay Rabbi, Kazi Sakib

Abstract: Similar to the rest of the world, the recent pandemic situation has forced the IT professionals of Bangladesh to adopt remote work. The aim of this study is to find out whether remote work can be continued even after the lockdown is lifted. As work from home may change various productivity related aspects of the employees, i.e., team dynamics and company dynamics, it is necessary to understand the… ▽ More Similar to the rest of the world, the recent pandemic situation has forced the IT professionals of Bangladesh to adopt remote work. The aim of this study is to find out whether remote work can be continued even after the lockdown is lifted. As work from home may change various productivity related aspects of the employees, i.e., team dynamics and company dynamics, it is necessary to understand the nature of the change during WFH. Conducting a survey, we asked the IT professionals of Bangladesh how they perceive their level of productivity during WFH and how the factors related to productivity have changed. We analyzed the change and identified the areas affected by WFH. We discovered that resource and workspace related issues, emotional well-being of the employees have been hampered the most during WFH. We believe that the findings from this study will help to decide how to resolve those issues and will help to understand whether WFH can be continued even after the lockdown is lifted. △ Less

Submitted 11 September, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

arXiv:2002.01530 [pdf, other]

Deep Differentiable Grasp Planner for High-DOF Grippers

Authors: Min Liu, Zherong Pan, Kai Xu, Kanishka Ganguly, Dinesh Manocha

Abstract: We present an end-to-end algorithm for training deep neural networks to grasp novel objects. Our algorithm builds all the essential components of a grasping system using a forward-backward automatic differentiation approach, including the forward kinematics of the gripper, the collision between the gripper and the target object, and the metric for grasp poses. In particular, we show that a general… ▽ More We present an end-to-end algorithm for training deep neural networks to grasp novel objects. Our algorithm builds all the essential components of a grasping system using a forward-backward automatic differentiation approach, including the forward kinematics of the gripper, the collision between the gripper and the target object, and the metric for grasp poses. In particular, we show that a generalized Q1 grasp metric is defined and differentiable for inexact grasps generated by a neural network, and the derivatives of our generalized Q1 metric can be computed from a sensitivity analysis of the induced optimization problem. We show that the derivatives of the (self-)collision terms can be efficiently computed from a watertight triangle mesh of low-quality. Altogether, our algorithm allows for the computation of grasp poses for high-DOF grippers in an unsupervised mode with no ground truth data, or it improves the results in a supervised mode using a small dataset. Our new learning algorithm significantly simplifies the data preparation for learning-based grasping systems and leads to higher qualities of learned grasps on common 3D shape datasets [7, 49, 26, 25], achieving a 22% higher success rate on physical hardware and a 0.12 higher value on the Q1 grasp quality metric. △ Less

Submitted 15 July, 2020; v1 submitted 4 February, 2020; originally announced February 2020.

arXiv:1903.08248 [pdf, other]

Computational Tactile Flow for Anthropomorphic Grippers

Authors: Kanishka Ganguly, Behzad Sadrfaridpour, Cornelia Fermüller, Yiannis Aloimonos

Abstract: Grasping objects requires tight integration between visual and tactile feedback. However, there is an inherent difference in the scale at which both these input modalities operate. It is thus necessary to be able to analyze tactile feedback in isolation in order to gain information about the surface the end-effector is operating on, such that more fine-grained features may be extracted from the su… ▽ More Grasping objects requires tight integration between visual and tactile feedback. However, there is an inherent difference in the scale at which both these input modalities operate. It is thus necessary to be able to analyze tactile feedback in isolation in order to gain information about the surface the end-effector is operating on, such that more fine-grained features may be extracted from the surroundings. For tactile perception of the robot, inspired by the concept of the tactile flow in humans, we present the computational tactile flow to improve the analysis of the tactile feedback in robots using a Shadow Dexterous Hand. In the computational tactile flow model, given a sequence of pressure values from the tactile sensors, we define a virtual surface for the pressure values and define the tactile flow as the optical flow of this surface. We provide case studies that demonstrate how the computational tactile flow maps reveal information on the direction of motion and 3D structure of the surface, and feedback regarding the action being performed by the robot. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: 8 pages, 18 figures. Submitted to 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2019)

arXiv:1903.00425 [pdf, other]

Generating Grasp Poses for a High-DOF Gripper Using Neural Networks

Authors: Min Liu, Zherong Pan, Kai Xu, Kanishka Ganguly, Dinesh Manocha

Abstract: We present a learning-based method for representing grasp poses of a high-DOF hand using neural networks. Due to redundancy in such high-DOF grippers, there exists a large number of equally effective grasp poses for a given target object, making it difficult for the neural network to find consistent grasp poses. We resolve this ambiguity by generating an augmented dataset that covers many possible… ▽ More We present a learning-based method for representing grasp poses of a high-DOF hand using neural networks. Due to redundancy in such high-DOF grippers, there exists a large number of equally effective grasp poses for a given target object, making it difficult for the neural network to find consistent grasp poses. We resolve this ambiguity by generating an augmented dataset that covers many possible grasps for each target object and train our neural networks using a consistency loss function to identify a one-to-one mapping from objects to grasp poses. We further enhance the quality of neural-network-predicted grasp poses using a collision loss function to avoid penetrations. We use an object dataset that combines the BigBIRD Database, the KIT Database, the YCB Database, and the Grasp Dataset to show that our method can generate high-DOF grasp poses with higher accuracy than supervised learning baselines. The quality of the grasp poses is on par with the groundtruth poses in the dataset. In addition, our method is robust and can handle noisy object models such as those constructed from multi-view depth images, allowing our method to be implemented on a 25-DOF Shadow Hand hardware platform. △ Less

Submitted 16 July, 2020; v1 submitted 1 March, 2019; originally announced March 2019.

arXiv:1807.04870 [pdf, other]

Extracting Contact and Motion from Manipulation Videos

Authors: Konstantinos Zampogiannis, Kanishka Ganguly, Cornelia Fermuller, Yiannis Aloimonos

Abstract: When we physically interact with our environment using our hands, we touch objects and force them to move: contact and motion are defining properties of manipulation. In this paper, we present an active, bottom-up method for the detection of actor-object contacts and the extraction of moved objects and their motions in RGBD videos of manipulation actions. At the core of our approach lies non-rigid… ▽ More When we physically interact with our environment using our hands, we touch objects and force them to move: contact and motion are defining properties of manipulation. In this paper, we present an active, bottom-up method for the detection of actor-object contacts and the extraction of moved objects and their motions in RGBD videos of manipulation actions. At the core of our approach lies non-rigid registration: we continuously warp a point cloud model of the observed scene to the current video frame, generating a set of dense 3D point trajectories. Under loose assumptions, we employ simple point cloud segmentation techniques to extract the actor and subsequently detect actor-environment contacts based on the estimated trajectories. For each such interaction, using the detected contact as an attention mechanism, we obtain an initial motion segment for the manipulated object by clustering trajectories in the contact area vicinity and then we jointly refine the object segment and estimate its 6DOF pose in all observed frames. Because of its generality and the fundamental, yet highly informative, nature of its outputs, our approach is applicable to a wide range of perception and planning tasks. We qualitatively evaluate our method on a number of input sequences and present a comprehensive robot imitation learning example, in which we demonstrate the crucial role of our outputs in developing action representations/plans from observation. △ Less

Submitted 2 February, 2019; v1 submitted 12 July, 2018; originally announced July 2018.

arXiv:1806.06208 [pdf]

Offline Extraction of Indic Regional Language from Natural Scene Image using Text Segmentation and Deep Convolutional Sequence

Authors: Sauradip Nag, Pallab Kumar Ganguly, Sumit Roy, Sourab Jha, Krishna Bose, Abhishek Jha, Kousik Dasgupta

Abstract: Regional language extraction from a natural scene image is always a challenging proposition due to its dependence on the text information extracted from Image. Text Extraction on the other hand varies on different lighting condition, arbitrary orientation, inadequate text information, heavy background influence over text and change of text appearance. This paper presents a novel unified method for… ▽ More Regional language extraction from a natural scene image is always a challenging proposition due to its dependence on the text information extracted from Image. Text Extraction on the other hand varies on different lighting condition, arbitrary orientation, inadequate text information, heavy background influence over text and change of text appearance. This paper presents a novel unified method for tackling the above challenges. The proposed work uses an image correction and segmentation technique on the existing Text Detection Pipeline an Efficient and Accurate Scene Text Detector (EAST). EAST uses standard PVAnet architecture to select features and non maximal suppression to detect text from image. Text recognition is done using combined architecture of MaxOut convolution neural network (CNN) and Bidirectional long short term memory (LSTM) network. After recognizing text using the Deep Learning based approach, the native Languages are translated to English and tokenized using standard Text Tokenizers. The tokens that very likely represent a location is used to find the Global Positioning System (GPS) coordinates of the location and subsequently the regional languages spoken in that location is extracted. The proposed method is tested on a self generated dataset collected from Government of India dataset and experimented on Standard Dataset to evaluate the performance of the proposed technique. Comparative study with a few state-of-the-art methods on text detection, recognition and extraction of regional language from images shows that the proposed method outperforms the existing methods. △ Less

Submitted 6 July, 2018; v1 submitted 16 June, 2018; originally announced June 2018.

Comments: Accepted in Second International Conference on Computational Intelligence, Communications, and Business Analytics (CICBA-2018)

arXiv:1802.05330 [pdf, other]

doi 10.1109/LRA.2018.2843445

GapFlyt: Active Vision Based Minimalist Structure-less Gap Detection For Quadrotor Flight

Authors: Nitin J Sanket, Chahat Deep Singh, Kanishka Ganguly, Cornelia Fermüller, Yiannis Aloimonos

Abstract: Although quadrotors, and aerial robots in general, are inherently active agents, their perceptual capabilities in literature so far have been mostly passive in nature. Researchers and practitioners today use traditional computer vision algorithms with the aim of building a representation of general applicability: a 3D reconstruction of the scene. Using this representation, planning tasks are const… ▽ More Although quadrotors, and aerial robots in general, are inherently active agents, their perceptual capabilities in literature so far have been mostly passive in nature. Researchers and practitioners today use traditional computer vision algorithms with the aim of building a representation of general applicability: a 3D reconstruction of the scene. Using this representation, planning tasks are constructed and accomplished to allow the quadrotor to demonstrate autonomous behavior. These methods are inefficient as they are not task driven and such methodologies are not utilized by flying insects and birds. Such agents have been solving the problem of navigation and complex control for ages without the need to build a 3D map and are highly task driven. In this paper, we propose this framework of bio-inspired perceptual design for quadrotors. We use this philosophy to design a minimalist sensori-motor framework for a quadrotor to fly though unknown gaps without a 3D reconstruction of the scene using only a monocular camera and onboard sensing. We successfully evaluate and demonstrate the proposed approach in many real-world experiments with different settings and window shapes, achieving a success rate of 85% at 2.5ms$^{-1}$ even with a minimum tolerance of just 5cm. To our knowledge, this is the first paper which addresses the problem of gap detection of an unknown shape and location with a monocular camera and onboard sensing. △ Less

Submitted 1 July, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: 11 pages, 15 figures, 4 tables. Published in IEEE Robotics and Automation Letters (2018)

Showing 1–17 of 17 results for author: Ganguly, K