Search | arXiv e-print repository

MOMENTS: A Comprehensive Multimodal Benchmark for Theory of Mind

Authors: Emilio Villa-Cueva, S M Masrur Ahmed, Rendi Chevi, Jan Christian Blaise Cruz, Kareem Elzeky, Fermin Cristobal, Alham Fikri Aji, Skyler Wang, Rada Mihalcea, Thamar Solorio

Abstract: Understanding Theory of Mind is essential for building socially intelligent multimodal agents capable of perceiving and interpreting human behavior. We introduce MOMENTS (Multimodal Mental States), a comprehensive benchmark designed to assess the ToM capabilities of multimodal large language models (LLMs) through realistic, narrative-rich scenarios presented in short films. MOMENTS includes over 2… ▽ More Understanding Theory of Mind is essential for building socially intelligent multimodal agents capable of perceiving and interpreting human behavior. We introduce MOMENTS (Multimodal Mental States), a comprehensive benchmark designed to assess the ToM capabilities of multimodal large language models (LLMs) through realistic, narrative-rich scenarios presented in short films. MOMENTS includes over 2,344 multiple-choice questions spanning seven distinct ToM categories. The benchmark features long video context windows and realistic social interactions that provide deeper insight into characters' mental states. While the visual modality generally enhances model performance, current systems still struggle to integrate it effectively, underscoring the need for further research into AI's multimodal understanding of human behavior. △ Less

Submitted 6 July, 2025; originally announced July 2025.

arXiv:2506.06486 [pdf, ps, other]

A Certified Unlearning Approach without Access to Source Data

Authors: Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler

Abstract: With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework… ▽ More With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning methods often assume access to the complete training dataset, which is unrealistic in scenarios where the source data is no longer available. To address this challenge, we propose a certified unlearning framework that enables effective data removal \final{without access to the original training data samples}. Our approach utilizes a surrogate dataset that approximates the statistical properties of the source data, allowing for controlled noise scaling based on the statistical distance between the two. \updated{While our theoretical guarantees assume knowledge of the exact statistical distance, practical implementations typically approximate this distance, resulting in potentially weaker but still meaningful privacy guarantees.} This ensures strong guarantees on the model's behavior post-unlearning while maintaining its overall utility. We establish theoretical bounds, introduce practical noise calibration techniques, and validate our method through extensive experiments on both synthetic and real-world datasets. The results demonstrate the effectiveness and reliability of our approach in privacy-sensitive settings. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: Accepted by ICML 2025

arXiv:2404.12627 [pdf, other]

A Soft e-Textile Sensor for Enhanced Deep Learning-based Shape Sensing of Soft Continuum Robots

Authors: Eric Vincent Galeta, Ayman A. Nada, Sabah M. Ahmed, Victor Parque, Haitham El-Hussieny

Abstract: The safety and accuracy of robotic navigation hold paramount importance, especially in the realm of soft continuum robotics, where the limitations of traditional rigid sensors become evident. Encoders, piezoresistive, and potentiometer sensors often fail to integrate well with the flexible nature of these robots, adding unwanted bulk and rigidity. To overcome these hurdles, our study presents a ne… ▽ More The safety and accuracy of robotic navigation hold paramount importance, especially in the realm of soft continuum robotics, where the limitations of traditional rigid sensors become evident. Encoders, piezoresistive, and potentiometer sensors often fail to integrate well with the flexible nature of these robots, adding unwanted bulk and rigidity. To overcome these hurdles, our study presents a new approach to shape sensing in soft continuum robots through the use of soft e-textile resistive sensors. This sensor, designed to flawlessly integrate with the robot's structure, utilizes a resistive material that adjusts its resistance in response to the robot's movements and deformations. This adjustment facilitates the capture of multidimensional force measurements across the soft sensor layers. A deep Convolutional Neural Network (CNN) is employed to decode the sensor signals, enabling precise estimation of the robot's shape configuration based on the detailed data from the e-textile sensor. Our research investigates the efficacy of this e-textile sensor in determining the curvature parameters of soft continuum robots. The findings are encouraging, showing that the soft e-textile sensor not only matches but potentially exceeds the capabilities of traditional rigid sensors in terms of shape sensing and estimation. This advancement significantly boosts the safety and efficiency of robotic navigation systems. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2402.08769 [pdf, other]

FLASH: Federated Learning Across Simultaneous Heterogeneities

Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

Abstract: The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of h… ▽ More The key premise of federated learning (FL) is to train ML models across a diverse set of data-owners (clients), without exchanging local data. An overarching challenge to this date is client heterogeneity, which may arise not only from variations in data distribution, but also in data quality, as well as compute/communication latency. An integrated view of these diverse and concurrent sources of heterogeneity is critical; for instance, low-latency clients may have poor data quality, and vice versa. In this work, we propose FLASH(Federated Learning Across Simultaneous Heterogeneities), a lightweight and flexible client selection algorithm that outperforms state-of-the-art FL frameworks under extensive sources of heterogeneity, by trading-off the statistical information associated with the client's data quality, data distribution, and latency. FLASH is the first method, to our knowledge, for handling all these heterogeneities in a unified manner. To do so, FLASH models the learning dynamics through contextual multi-armed bandits (CMAB) and dynamically selects the most promising clients. Through extensive experiments, we demonstrate that FLASH achieves substantial and consistent improvements over state-of-the-art baselines -- as much as 10% in absolute accuracy -- thanks to its unified approach. Importantly, FLASH also outperforms federated aggregation methods that are designed to handle highly heterogeneous settings and even enjoys a performance boost when integrated with them. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2401.04130 [pdf, other]

Plug-and-Play Transformer Modules for Test-Time Adaptation

Authors: Xiangyu Chang, Sk Miraj Ahmed, Srikanth V. Krishnamurthy, Basak Guler, Ananthram Swami, Samet Oymak, Amit K. Roy-Chowdhury

Abstract: Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate… ▽ More Parameter-efficient tuning (PET) methods such as LoRA, Adapter, and Visual Prompt Tuning (VPT) have found success in enabling adaptation to new domains by tuning small modules within a transformer model. However, the number of domains encountered during test time can be very large, and the data is usually unlabeled. Thus, adaptation to new domains is challenging; it is also impractical to generate customized tuned modules for each such domain. Toward addressing these challenges, this work introduces PLUTO: a Plug-and-pLay modUlar Test-time domain adaptatiOn strategy. We pre-train a large set of modules, each specialized for different source domains, effectively creating a ``module store''. Given a target domain with few-shot unlabeled data, we introduce an unsupervised test-time adaptation (TTA) method to (1) select a sparse subset of relevant modules from this store and (2) create a weighted combination of selected modules without tuning their weights. This plug-and-play nature enables us to harness multiple most-relevant source domains in a single inference call. Comprehensive evaluations demonstrate that PLUTO uniformly outperforms alternative TTA methods and that selecting $\leq$5 modules suffice to extract most of the benefit. At a high level, our method equips pre-trained transformers with the capability to dynamically adapt to new domains, motivating a new paradigm for efficient and scalable domain adaptation. △ Less

Submitted 8 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

arXiv:2401.02561 [pdf, other]

CONTRAST: Continual Multi-source Adaptation to Dynamic Distributions

Authors: Sk Miraj Ahmed, Fahim Faisal Niloy, Xiangyu Chang, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

Abstract: Adapting to dynamic data distributions is a practical yet challenging task. One effective strategy is to use a model ensemble, which leverages the diverse expertise of different models to transfer knowledge to evolving data distributions. However, this approach faces difficulties when the dynamic test distribution is available only in small batches and without access to the original source data. T… ▽ More Adapting to dynamic data distributions is a practical yet challenging task. One effective strategy is to use a model ensemble, which leverages the diverse expertise of different models to transfer knowledge to evolving data distributions. However, this approach faces difficulties when the dynamic test distribution is available only in small batches and without access to the original source data. To address the challenge of adapting to dynamic distributions in such practical settings, we propose Continual Multi-source Adaptation to Dynamic Distributions (CONTRAST), a novel method that optimally combines multiple source models to adapt to the dynamic test data. CONTRAST has two distinguishing features. First, it efficiently computes the optimal combination weights to combine the source models to adapt to the test data distribution continuously as a function of time. Second, it identifies which of the source model parameters to update so that only the model which is most correlated to the target data is adapted, leaving the less correlated ones untouched; this mitigates the issue of ``forgetting" the source model parameters by focusing only on the source model that exhibits the strongest correlation with the test batch distribution. Through theoretical analysis we show that the proposed method is able to optimally combine the source models and prioritize updates to the model least prone to forgetting. Experimental analysis on diverse datasets demonstrates that the combination of multiple source models does at least as well as the best source (with hindsight knowledge), and performance does not degrade as the test data distribution changes over time (robust to forgetting). △ Less

Submitted 6 November, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: NeurIPS 2024

arXiv:2311.04991 [pdf, other]

Effective Restoration of Source Knowledge in Continual Test Time Adaptation

Authors: Fahim Faisal Niloy, Sk Miraj Ahmed, Dripta S. Raychaudhuri, Samet Oymak, Amit K. Roy-Chowdhury

Abstract: Traditional test-time adaptation (TTA) methods face significant challenges in adapting to dynamic environments characterized by continuously changing long-term target distributions. These challenges primarily stem from two factors: catastrophic forgetting of previously learned valuable source knowledge and gradual error accumulation caused by miscalibrated pseudo labels. To address these issues, t… ▽ More Traditional test-time adaptation (TTA) methods face significant challenges in adapting to dynamic environments characterized by continuously changing long-term target distributions. These challenges primarily stem from two factors: catastrophic forgetting of previously learned valuable source knowledge and gradual error accumulation caused by miscalibrated pseudo labels. To address these issues, this paper introduces an unsupervised domain change detection method that is capable of identifying domain shifts in dynamic environments and subsequently resets the model parameters to the original source pre-trained values. By restoring the knowledge from the source, it effectively corrects the negative consequences arising from the gradual deterioration of model parameters caused by ongoing shifts in the domain. Our method involves progressive estimation of global batch-norm statistics specific to each domain, while keeping track of changes in the statistics triggered by domain shifts. Importantly, our method is agnostic to the specific adaptation technique employed and thus, can be incorporated to existing TTA methods to enhance their performance in dynamic environments. We perform extensive experiments on benchmark datasets to demonstrate the superior performance of our method compared to state-of-the-art adaptation methods. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: WACV 2024

arXiv:2308.11880 [pdf, other]

SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

Authors: Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury

Abstract: Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these a… ▽ More Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these assumptions may be problematic for many applications. Source data may not be available due to privacy, security, or economic concerns. Assuming the existence of paired multi-modal data for training also entails significant data collection costs and fails to take advantage of widely available freely distributed pre-trained uni-modal models. In this work, we relax both of these assumptions by addressing the problem of adapting a set of models trained independently on uni-modal data to a target domain consisting of unlabeled multi-modal data, without having access to the original source dataset. Our proposed approach solves this problem through a switching framework which automatically chooses between two complementary methods of cross-modal pseudo-label fusion -- agreement filtering and entropy weighting -- based on the estimated domain gap. We demonstrate our work on the semantic segmentation problem. Experiments across seven challenging adaptation scenarios verify the efficacy of our approach, achieving results comparable to, and in some cases outperforming, methods which assume access to source data. Our method achieves an improvement in mIoU of up to 12% over competing baselines. Our code is publicly available at https://github.com/csimo005/SUMMIT. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 12 pages, 5 figures, 9 tables, ICCV 2023

arXiv:2307.05375 [pdf, other]

Emotion Analysis on EEG Signal Using Machine Learning and Neural Network

Authors: S. M. Masrur Ahmed, Eshaan Tanzim Sabur

Abstract: Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of populatio… ▽ More Emotion has a significant influence on how one thinks and interacts with others. It serves as a link between how a person feels and the actions one takes, or it could be said that it influences one's life decisions on occasion. Since the patterns of emotions and their reflections vary from person to person, their inquiry must be based on approaches that are effective over a wide range of population regions. To extract features and enhance accuracy, emotion recognition using brain waves or EEG signals requires the implementation of efficient signal processing techniques. Various approaches to human-machine interaction technologies have been ongoing for a long time, and in recent years, researchers have had great success in automatically understanding emotion using brain signals. In our research, several emotional states were classified and tested on EEG signals collected from a well-known publicly available dataset, the DEAP Dataset, using SVM (Support Vector Machine), KNN (K-Nearest Neighbor), and an advanced neural network model, RNN (Recurrent Neural Network), trained with LSTM (Long Short Term Memory). The main purpose of this study is to improve ways to improve emotion recognition performance using brain signals. Emotions, on the other hand, can change with time. As a result, the changes in emotion over time are also examined in our research. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2209.04027 [pdf, other]

Cross-Modal Knowledge Transfer Without Task-Relevant Source Data

Authors: Sk Miraj Ahmed, Suhas Lohit, Kuan-Chuan Peng, Michael J. Jones, Amit K. Roy-Chowdhury

Abstract: Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge… ▽ More Cost-effective depth and infrared sensors as alternatives to usual RGB sensors are now a reality, and have some advantages over RGB in domains like autonomous navigation and remote sensing. As such, building computer vision and deep learning systems for depth and infrared data are crucial. However, large labeled datasets for these modalities are still lacking. In such cases, transferring knowledge from a neural network trained on a well-labeled large dataset in the source modality (RGB) to a neural network that works on a target modality (depth, infrared, etc.) is of great value. For reasons like memory and privacy, it may not be possible to access the source data, and knowledge transfer needs to work with only the source models. We describe an effective solution, SOCKET: SOurce-free Cross-modal KnowledgE Transfer for this challenging task of transferring knowledge from one source modality to a different target modality without access to task-relevant source data. The framework reduces the modality gap using paired task-irrelevant data, as well as by matching the mean and variance of the target features with the batch-norm statistics that are present in the source models. We show through extensive experiments that our method significantly outperforms existing source-free methods for classification tasks which do not account for the modality gap. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2104.01845 [pdf, other]

Unsupervised Multi-source Domain Adaptation Without Access to Source Data

Authors: Sk Miraj Ahmed, Dripta S. Raychaudhuri, Sujoy Paul, Samet Oymak, Amit K. Roy-Chowdhury

Abstract: Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain. However, most of these conventional UDA approaches make the strong assumption of having access to the source data during training, which may not be very practical due to privacy, security and storage concerns. A recent line of work addressed… ▽ More Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain. However, most of these conventional UDA approaches make the strong assumption of having access to the source data during training, which may not be very practical due to privacy, security and storage concerns. A recent line of work addressed this problem and proposed an algorithm that transfers knowledge to the unlabeled target domain from a single source model without requiring access to the source data. However, for adaptation purposes, if there are multiple trained source models available to choose from, this method has to go through adapting each and every model individually, to check for the best source. Thus, we ask the question: can we find the optimal combination of source models, with no source data and without target labels, whose performance is no worse than the single best source? To answer this, we propose a novel and efficient algorithm which automatically combines the source models with suitable weights in such a way that it performs at least as good as the best source model. We provide intuitive theoretical insights to justify our claim. Furthermore, extensive experiments are conducted on several benchmark datasets to show the effectiveness of our algorithm, where in most cases, our method not only reaches best source accuracy but also outperforms it. △ Less

Submitted 5 April, 2021; originally announced April 2021.

Comments: This paper will appear at CVPR 2021

arXiv:2007.11149 [pdf, other]

Camera On-boarding for Person Re-identification using Hypothesis Transfer Learning

Authors: Sk Miraj Ahmed, Aske R Lejbølle, Rameswar Panda, Amit K. Roy-Chowdhury

Abstract: Most of the existing approaches for person re-identification consider a static setting where the number of cameras in the network is fixed. An interesting direction, which has received little attention, is to explore the dynamic nature of a camera network, where one tries to adapt the existing re-identification models after on-boarding new cameras, with little additional effort. There have been a… ▽ More Most of the existing approaches for person re-identification consider a static setting where the number of cameras in the network is fixed. An interesting direction, which has received little attention, is to explore the dynamic nature of a camera network, where one tries to adapt the existing re-identification models after on-boarding new cameras, with little additional effort. There have been a few recent methods proposed in person re-identification that attempt to address this problem by assuming the labeled data in the existing network is still available while adding new cameras. This is a strong assumption since there may exist some privacy issues for which one may not have access to those data. Rather, based on the fact that it is easy to store the learned re-identifications models, which mitigates any data privacy concern, we develop an efficient model adaptation approach using hypothesis transfer learning that aims to transfer the knowledge using only source models and limited labeled data, but without using any source camera data from the existing network. Our approach minimizes the effect of negative transfer by finding an optimal weighted combination of multiple source models for transferring the knowledge. Extensive experiments on four challenging benchmark datasets with a variable number of cameras well demonstrate the efficacy of our proposed approach over state-of-the-art methods. △ Less

Submitted 5 August, 2020; v1 submitted 21 July, 2020; originally announced July 2020.

Comments: Accepted to CVPR 2020

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020) 12144-12153

arXiv:1904.04218 [pdf, other]

Least-squares registration of point sets over SE (d) using closed-form projections

Authors: Sk. Miraj Ahmed, Niladri Ranjan Das, Kunal Narayan Chaudhury

Abstract: Consider the problem of registering multiple point sets in some $d$-dimensional space using rotations and translations. Assume that there are sets with common points, and moreover the pairwise correspondences are known for such sets. We consider a least-squares formulation of this problem, where the variables are the transforms associated with the point sets. The present novelty is that we reduce… ▽ More Consider the problem of registering multiple point sets in some $d$-dimensional space using rotations and translations. Assume that there are sets with common points, and moreover the pairwise correspondences are known for such sets. We consider a least-squares formulation of this problem, where the variables are the transforms associated with the point sets. The present novelty is that we reduce this nonconvex problem to an optimization over the positive semidefinite cone, where the objective is linear but the constraints are nevertheless nonconvex. We propose to solve this using variable splitting and the alternating directions method of multipliers (ADMM). Due to the linearity of the objective and the structure of constraints, the ADMM subproblems are given by projections with closed-form solutions. In particular, for $m$ point sets, the dominant cost per iteration is the partial eigendecomposition of an $md \times md$ matrix, and $m-1$ singular value decompositions of $d \times d$ matrices. We empirically show that for appropriate parameter settings, the proposed solver has a large convergence basin and is stable under perturbations. As applications, we use our method for $2$D shape matching and $3$D multiview registration. In either application, we model the shapes/scans as point sets and determine the pairwise correspondences using ICP. In particular, our algorithm compares favorably with existing methods for multiview reconstruction in terms of timing and accuracy. △ Less

Submitted 12 April, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

Comments: To appear in Computer Vision and Image Understanding (Elsevier)

arXiv:1902.00705 [pdf]

Big Data and Social/Medical Sciences: State of the Art and Future Trends

Authors: Adil E. Rajput, Samara M. Ahmed

Abstract: The explosion of data on the internet is a direct corollary of the social media platform. With petabytes of data being generated by end users, the researchers have access to unprecedented amount of data (Big Data). Such data provides an insight into user mental state and hence can be utilized to produce clinical evidence. This lofty goal requires a thorough understanding of not only the mental hea… ▽ More The explosion of data on the internet is a direct corollary of the social media platform. With petabytes of data being generated by end users, the researchers have access to unprecedented amount of data (Big Data). Such data provides an insight into user mental state and hence can be utilized to produce clinical evidence. This lofty goal requires a thorough understanding of not only the mental health issues but also the technology trends underlying the Big Data and how they can be leveraged effectively. The paper looks at various such concepts, provides an overview and enumerates the work that has been done in this realm. Furthermore, we provide guidelines for future work that will help in streamlining the Big Data use in social/medical sciences. △ Less

Submitted 2 February, 2019; originally announced February 2019.

Comments: arXiv admin note: text overlap with arXiv:1902.00679

arXiv:1809.10470 [pdf]

Object Detection and Motion Planning for Automated Welding of Tubular Joints

Authors: Syeda Mariam Ahmed, Yan Zhi Tan, Gim Hee Lee, Chee Meng Chew, Chee Khiang Pang

Abstract: Automatic welding of tubular TKY joints is an important and challenging task for the marine and offshore industry. In this paper, a framework for tubular joint detection and motion planning is proposed. The pose of the real tubular joint is detected using RGB-D sensors, which is used to obtain a real-to-virtual mapping for positioning the workpiece in a virtual environment. For motion planning, a… ▽ More Automatic welding of tubular TKY joints is an important and challenging task for the marine and offshore industry. In this paper, a framework for tubular joint detection and motion planning is proposed. The pose of the real tubular joint is detected using RGB-D sensors, which is used to obtain a real-to-virtual mapping for positioning the workpiece in a virtual environment. For motion planning, a Bi-directional Transition based Rapidly exploring Random Tree (BiTRRT) algorithm is used to generate trajectories for reaching the desired goals. The complete framework is verified with experiments, and the results show that the robot welding torch is able to transit without collision to desired goals which are close to the tubular joint. △ Less

Submitted 27 September, 2018; originally announced September 2018.

arXiv:1809.10468 [pdf]

Edge and Corner Detection for Unorganized 3D Point Clouds with Application to Robotic Welding

Authors: Syeda Mariam Ahmed, Yan Zhi Tan, Chee Meng Chew, Abdullah Al Mamun, Fook Seng Wong

Abstract: In this paper, we propose novel edge and corner detection algorithms for unorganized point clouds. Our edge detection method evaluates symmetry in a local neighborhood and uses an adaptive density based threshold to differentiate 3D edge points. We extend this algorithm to propose a novel corner detector that clusters curvature vectors and uses their geometrical statistics to classify a point as c… ▽ More In this paper, we propose novel edge and corner detection algorithms for unorganized point clouds. Our edge detection method evaluates symmetry in a local neighborhood and uses an adaptive density based threshold to differentiate 3D edge points. We extend this algorithm to propose a novel corner detector that clusters curvature vectors and uses their geometrical statistics to classify a point as corner. We perform rigorous evaluation of the algorithms on RGB-D semantic segmentation and 3D washer models from the ShapeNet dataset and report higher precision and recall scores. Finally, we also demonstrate how our edge and corner detectors can be used as a novel approach towards automatic weld seam detection for robotic welding. We propose to generate weld seams directly from a point cloud as opposed to using 3D models for offline planning of welding paths. For this application, we show a comparison between Harris 3D and our proposed approach on a panel workpiece. △ Less

Submitted 27 September, 2018; originally announced September 2018.

arXiv:1502.04190

An Adaptive Sampling Approach to 3D Reconstruction of Weld Joint

Authors: Soheil Keshmiri, Syeda Mariam Ahmed, Yue Wu, Chee Meng Chew, Chee Khiang Pang

Abstract: We present an adaptive sampling approach to 3D reconstruction of the welding joint using the point cloud that is generated by a laser sensor. We start with a randomized strategy to approximate the surface of the volume of interest through selection of a number of pivotal candidates. Furthermore, we introduce three proposal distributions over the neighborhood of each of these pivots to adaptively s… ▽ More We present an adaptive sampling approach to 3D reconstruction of the welding joint using the point cloud that is generated by a laser sensor. We start with a randomized strategy to approximate the surface of the volume of interest through selection of a number of pivotal candidates. Furthermore, we introduce three proposal distributions over the neighborhood of each of these pivots to adaptively sample from their neighbors to refine the original randomized approximation to incrementally reconstruct this welding space. We prevent our algorithm from being trapped in a neighborhood via permanently labeling the visited samples. In addition, we accumulate the accepted candidates along with their selected neighbors in a queue structure to allow every selected sample to contribute to the evolution of the reconstructed welding space as the algorithm progresses. We analyze the performance of our adaptive sampling algorithm in contrast to the random sampling, with and without replacement, to show a significant improvement in total number of samples that are drawn to identify the region of interest, thereby expanding upon neighboring samples to extract the entire region in a fewer iterations and a shorter computation time. △ Less

Submitted 8 July, 2015; v1 submitted 14 February, 2015; originally announced February 2015.

Comments: Disapproval of funding organization

arXiv:1410.3987

Model-Free 3D Reconstruction of Weld Joint Using Laser Scanning

Authors: Soheil Keshmiri, Syeda Mariam Ahmed, Yue Wu, Chee Meng Chew, Chee Khiang Pang

Abstract: This article presents a novel utilization of the concept of entropy in information theory to model-free 3D reconstruction of weld joint in presence of noise. We show that our formulation attains its global minimum at the upper edge of this joint. This property significantly simplifies the extraction of this welding joint. Furthermore, we present an approach to compute the volume of this extracted… ▽ More This article presents a novel utilization of the concept of entropy in information theory to model-free 3D reconstruction of weld joint in presence of noise. We show that our formulation attains its global minimum at the upper edge of this joint. This property significantly simplifies the extraction of this welding joint. Furthermore, we present an approach to compute the volume of this extracted space to facilitate the monitoring of the progress of the welding task. Moreover, we provide a preliminary analysis of the effect of variation of the noise on the extraction process of this space to realize the impact of this noise on the computation of its area and volume. △ Less

Submitted 8 July, 2015; v1 submitted 15 October, 2014; originally announced October 2014.

Comments: Disapproval of funding organization

Showing 1–18 of 18 results for author: Ahmed, S M