Search | arXiv e-print repository

doi 10.1145/3719027.3765145

Fingerprinting Deep Packet Inspection Devices by Their Ambiguities

Authors: Diwen Xue, Armin Huremagic, Wayne Wang, Ram Sundara Raman, Roya Ensafi

Abstract: Users around the world face escalating network interference such as censorship, throttling, and interception, largely driven by the commoditization and growing availability of Deep Packet Inspection (DPI) devices. Once reserved for a few well-resourced nation-state actors, the ability to interfere with traffic at scale is now within reach of nearly any network operator. Despite this proliferation,… ▽ More Users around the world face escalating network interference such as censorship, throttling, and interception, largely driven by the commoditization and growing availability of Deep Packet Inspection (DPI) devices. Once reserved for a few well-resourced nation-state actors, the ability to interfere with traffic at scale is now within reach of nearly any network operator. Despite this proliferation, our understanding of DPIs and their deployments on the Internet remains limited -- being network intermediary leaves DPI unresponsive to conventional host-based scanning tools, and DPI vendors actively obscuring their products further complicates measurement efforts. In this work, we present a remote measurement framework, dMAP (DPI Mapper), that derives behavioral fingerprints for DPIs to differentiate and cluster these otherwise indistinguishable middleboxes at scale, as a first step toward active reconnaissance of DPIs on the Internet. Our key insight is that parsing and interpreting traffic as network intermediaries inherently involves ambiguities -- from under-specified protocol behaviors to differing RFC interpretations -- forcing DPI vendors into independent implementation choices that create measurable variance among DPIs. Based on differential fuzzing, dMAP systematically discovers, selects, and deploys specialized probes that translate DPI internal parsing behaviors into externally observable fingerprints. Applying dMAP to DPI deployments globally, we demonstrate its practical feasibility, showing that even a modest set of 20-40 discriminative probes reliably differentiates a wide range of DPI implementations, including major nation-state censorship infrastructures and commercial DPI products. We discuss how our fingerprinting methodology generalizes beyond censorship to other forms of targeted interference. △ Less

Submitted 10 September, 2025; originally announced September 2025.

Comments: In: Proceedings of the 2025 ACM SIGSAC Conference on Computer and Communications Security, 2025

arXiv:2509.04047 [pdf, ps, other]

TensoIS: A Step Towards Feed-Forward Tensorial Inverse Subsurface Scattering for Perlin Distributed Heterogeneous Media

Authors: Ashish Tiwari, Satyam Bhardwaj, Yash Bachwana, Parag Sarvoday Sahu, T. M. Feroz Ali, Bhargava Chintalapati, Shanmuganathan Raman

Abstract: Estimating scattering parameters of heterogeneous media from images is a severely under-constrained and challenging problem. Most of the existing approaches model BSSRDF either through an analysis-by-synthesis approach, approximating complex path integrals, or using differentiable volume rendering techniques to account for heterogeneity. However, only a few studies have applied learning-based meth… ▽ More Estimating scattering parameters of heterogeneous media from images is a severely under-constrained and challenging problem. Most of the existing approaches model BSSRDF either through an analysis-by-synthesis approach, approximating complex path integrals, or using differentiable volume rendering techniques to account for heterogeneity. However, only a few studies have applied learning-based methods to estimate subsurface scattering parameters, but they assume homogeneous media. Interestingly, no specific distribution is known to us that can explicitly model the heterogeneous scattering parameters in the real world. Notably, procedural noise models such as Perlin and Fractal Perlin noise have been effective in representing intricate heterogeneities of natural, organic, and inorganic surfaces. Leveraging this, we first create HeteroSynth, a synthetic dataset comprising photorealistic images of heterogeneous media whose scattering parameters are modeled using Fractal Perlin noise. Furthermore, we propose Tensorial Inverse Scattering (TensoIS), a learning-based feed-forward framework to estimate these Perlin-distributed heterogeneous scattering parameters from sparse multi-view image observations. Instead of directly predicting the 3D scattering parameter volume, TensoIS uses learnable low-rank tensor components to represent the scattering volume. We evaluate TensoIS on unseen heterogeneous variations over shapes from the HeteroSynth test set, smoke and cloud geometries obtained from open-source realistic volumetric simulations, and some real-world samples to establish its effectiveness for inverse scattering. Overall, this study is an attempt to explore Perlin noise distribution, given the lack of any such well-defined distribution in literature, to potentially model real-world heterogeneous scattering in a feed-forward manner. △ Less

Submitted 4 September, 2025; originally announced September 2025.

Comments: To appear in Pacific Graphics 2025 (CGF Journal Track), Project page: https://yashbachwana.github.io/TensoIS/

arXiv:2508.18944 [pdf, ps, other]

PanoHair: Detailed Hair Strand Synthesis on Volumetric Heads

Authors: Shashikant Verma, Shanmuganathan Raman

Abstract: Achieving realistic hair strand synthesis is essential for creating lifelike digital humans, but producing high-fidelity hair strand geometry remains a significant challenge. Existing methods require a complex setup for data acquisition, involving multi-view images captured in constrained studio environments. Additionally, these methods have longer hair volume estimation and strand synthesis times… ▽ More Achieving realistic hair strand synthesis is essential for creating lifelike digital humans, but producing high-fidelity hair strand geometry remains a significant challenge. Existing methods require a complex setup for data acquisition, involving multi-view images captured in constrained studio environments. Additionally, these methods have longer hair volume estimation and strand synthesis times, which hinder efficiency. We introduce PanoHair, a model that estimates head geometry as signed distance fields using knowledge distillation from a pre-trained generative teacher model for head synthesis. Our approach enables the prediction of semantic segmentation masks and 3D orientations specifically for the hair region of the estimated geometry. Our method is generative and can generate diverse hairstyles with latent space manipulations. For real images, our approach involves an inversion process to infer latent codes and produces visually appealing hair strands, offering a streamlined alternative to complex multi-view data acquisition setups. Given the latent code, PanoHair generates a clean manifold mesh for the hair region in under 5 seconds, along with semantic and orientation maps, marking a significant improvement over existing methods, as demonstrated in our experiments. △ Less

Submitted 26 August, 2025; originally announced August 2025.

arXiv:2508.07536 [pdf, ps, other]

Physics-Informed Multimodal Bearing Fault Classification under Variable Operating Conditions using Transfer Learning

Authors: Tasfiq E. Alam, Md Manjurul Ahsan, Shivakumar Raman

Abstract: Accurate and interpretable bearing fault classification is critical for ensuring the reliability of rotating machinery, particularly under variable operating conditions where domain shifts can significantly degrade model performance. This study proposes a physics-informed multimodal convolutional neural network (CNN) with a late fusion architecture, integrating vibration and motor current signals… ▽ More Accurate and interpretable bearing fault classification is critical for ensuring the reliability of rotating machinery, particularly under variable operating conditions where domain shifts can significantly degrade model performance. This study proposes a physics-informed multimodal convolutional neural network (CNN) with a late fusion architecture, integrating vibration and motor current signals alongside a dedicated physics-based feature extraction branch. The model incorporates a novel physics-informed loss function that penalizes physically implausible predictions based on characteristic bearing fault frequencies - Ball Pass Frequency Outer (BPFO) and Ball Pass Frequency Inner (BPFI) - derived from bearing geometry and shaft speed. Comprehensive experiments on the Paderborn University dataset demonstrate that the proposed physics-informed approach consistently outperforms a non-physics-informed baseline, achieving higher accuracy, reduced false classifications, and improved robustness across multiple data splits. To address performance degradation under unseen operating conditions, three transfer learning (TL) strategies - Target-Specific Fine-Tuning (TSFT), Layer-Wise Adaptation Strategy (LAS), and Hybrid Feature Reuse (HFR) - are evaluated. Results show that LAS yields the best generalization, with additional performance gains when combined with physics-informed modeling. Validation on the KAIST bearing dataset confirms the framework's cross-dataset applicability, achieving up to 98 percent accuracy. Statistical hypothesis testing further verifies significant improvements (p < 0.01) in classification performance. The proposed framework demonstrates the potential of integrating domain knowledge with data-driven learning to achieve robust, interpretable, and generalizable fault diagnosis for real-world industrial applications. △ Less

Submitted 10 August, 2025; originally announced August 2025.

arXiv:2508.06845 [pdf, ps, other]

Hybrid Machine Learning Framework for Predicting Geometric Deviations from 3D Surface Metrology

Authors: Hamidreza Samadi, Md Manjurul Ahsan, Shivakumar Raman

Abstract: This study addresses the challenge of accurately forecasting geometric deviations in manufactured components using advanced 3D surface analysis. Despite progress in modern manufacturing, maintaining dimensional precision remains difficult, particularly for complex geometries. We present a methodology that employs a high-resolution 3D scanner to acquire multi-angle surface data from 237 components… ▽ More This study addresses the challenge of accurately forecasting geometric deviations in manufactured components using advanced 3D surface analysis. Despite progress in modern manufacturing, maintaining dimensional precision remains difficult, particularly for complex geometries. We present a methodology that employs a high-resolution 3D scanner to acquire multi-angle surface data from 237 components produced across different batches. The data were processed through precise alignment, noise reduction, and merging techniques to generate accurate 3D representations. A hybrid machine learning framework was developed, combining convolutional neural networks for feature extraction with gradient-boosted decision trees for predictive modeling. The proposed system achieved a prediction accuracy of 0.012 mm at a 95% confidence level, representing a 73% improvement over conventional statistical process control methods. In addition to improved accuracy, the model revealed hidden correlations between manufacturing parameters and geometric deviations. This approach offers significant potential for automated quality control, predictive maintenance, and design optimization in precision manufacturing, and the resulting dataset provides a strong foundation for future predictive modeling research. △ Less

Submitted 9 August, 2025; originally announced August 2025.

arXiv:2507.18532 [pdf, ps, other]

doi 10.21227/bpqn-9a12

COT-AD: Cotton Analysis Dataset

Authors: Akbar Ali, Mahek Vyas, Soumyaratna Debnath, Chanda Grover Kamra, Jaidev Sanjay Khalane, Reuben Shibu Devanesan, Indra Deep Mastan, Subramanian Sankaranarayanan, Pankaj Khanna, Shanmuganathan Raman

Abstract: This paper presents COT-AD, a comprehensive Dataset designed to enhance cotton crop analysis through computer vision. Comprising over 25,000 images captured throughout the cotton growth cycle, with 5,000 annotated images, COT-AD includes aerial imagery for field-scale detection and segmentation and high-resolution DSLR images documenting key diseases. The annotations cover pest and disease recogni… ▽ More This paper presents COT-AD, a comprehensive Dataset designed to enhance cotton crop analysis through computer vision. Comprising over 25,000 images captured throughout the cotton growth cycle, with 5,000 annotated images, COT-AD includes aerial imagery for field-scale detection and segmentation and high-resolution DSLR images documenting key diseases. The annotations cover pest and disease recognition, vegetation, and weed analysis, addressing a critical gap in cotton-specific agricultural datasets. COT-AD supports tasks such as classification, segmentation, image restoration, enhancement, deep generative model-based cotton crop synthesis, and early disease management, advancing data-driven crop management △ Less

Submitted 24 July, 2025; originally announced July 2025.

Comments: Dataset publicly available at: https://ieee-dataport.org/documents/cot-adcotton-analysis-dataset. Accepted to IEEE International Conference on Image Processing (ICIP) 2025

ACM Class: I.4.9; I.5.4; H.2.8

arXiv:2507.05653 [pdf, ps, other]

AAPA: An Archetype-Aware Predictive Autoscaler with Uncertainty Quantification for Serverless Workloads on Kubernetes

Authors: Guilin Zhang, Srinivas Vippagunta, Raghavendra Nandagopal, Suchitra Raman, Jeff Xu, Marcus Pfeiffer, Shreeshankar Chatterjee, Ziqi Tan, Wulan Guo, Hailong Jiang

Abstract: Serverless platforms such as Kubernetes are increasingly adopted in high-performance computing, yet autoscaling remains challenging under highly dynamic and heterogeneous workloads. Existing approaches often rely on uniform reactive policies or unconditioned predictive models, ignoring both workload semantics and prediction uncertainty. We present AAPA, an archetype-aware predictive autoscaler tha… ▽ More Serverless platforms such as Kubernetes are increasingly adopted in high-performance computing, yet autoscaling remains challenging under highly dynamic and heterogeneous workloads. Existing approaches often rely on uniform reactive policies or unconditioned predictive models, ignoring both workload semantics and prediction uncertainty. We present AAPA, an archetype-aware predictive autoscaler that classifies workloads into four behavioral patterns -- SPIKE, PERIODIC, RAMP, and STATIONARY -- and applies tailored scaling strategies with confidence-based adjustments. To support reproducible evaluation, we release AAPAset, a weakly labeled dataset of 300,000 Azure Functions workload windows spanning diverse patterns. AAPA reduces SLO violations by up to 50% and lowers latency by 40% compared to Kubernetes HPA, albeit at 2-8x higher resource usage under spike-dominated conditions. To assess trade-offs, we propose the Resource Efficiency Index (REI), a unified metric balancing performance, cost, and scaling smoothness. Our results demonstrate the importance of modeling workload heterogeneity and uncertainty in autoscaling design. △ Less

Submitted 16 July, 2025; v1 submitted 8 July, 2025; originally announced July 2025.

Comments: 6 pages, 4 figures, 1 table. First three authors contributed equally. Correspondence to Hailong Jiang

arXiv:2506.22850 [pdf, ps, other]

DMD-Net: Deep Mesh Denoising Network

Authors: Aalok Gangopadhyay, Shashikant Verma, Shanmuganathan Raman

Abstract: We present Deep Mesh Denoising Network (DMD-Net), an end-to-end deep learning framework, for solving the mesh denoising problem. DMD-Net consists of a Graph Convolutional Neural Network in which aggregation is performed in both the primal as well as the dual graph. This is realized in the form of an asymmetric two-stream network, which contains a primal-dual fusion block that enables communication… ▽ More We present Deep Mesh Denoising Network (DMD-Net), an end-to-end deep learning framework, for solving the mesh denoising problem. DMD-Net consists of a Graph Convolutional Neural Network in which aggregation is performed in both the primal as well as the dual graph. This is realized in the form of an asymmetric two-stream network, which contains a primal-dual fusion block that enables communication between the primal-stream and the dual-stream. We develop a Feature Guided Transformer (FGT) paradigm, which consists of a feature extractor, a transformer, and a denoiser. The feature extractor estimates the local features, that guide the transformer to compute a transformation, which is applied to the noisy input mesh to obtain a useful intermediate representation. This is further processed by the denoiser to obtain the denoised mesh. Our network is trained on a large scale dataset of 3D objects. We perform exhaustive ablation studies to demonstrate that each component in our network is essential for obtaining the best performance. We show that our method obtains competitive or better results when compared with the state-of-the-art mesh denoising algorithms. We demonstrate that our method is robust to various kinds of noise. We observe that even in the presence of extremely high noise, our method achieves excellent performance. △ Less

Submitted 28 June, 2025; originally announced June 2025.

arXiv:2506.22833 [pdf, ps, other]

SemFaceEdit: Semantic Face Editing on Generative Radiance Manifolds

Authors: Shashikant Verma, Shanmuganathan Raman

Abstract: Despite multiple view consistency offered by 3D-aware GAN techniques, the resulting images often lack the capacity for localized editing. In response, generative radiance manifolds emerge as an efficient approach for constrained point sampling within volumes, effectively reducing computational demands and enabling the learning of fine details. This work introduces SemFaceEdit, a novel method that… ▽ More Despite multiple view consistency offered by 3D-aware GAN techniques, the resulting images often lack the capacity for localized editing. In response, generative radiance manifolds emerge as an efficient approach for constrained point sampling within volumes, effectively reducing computational demands and enabling the learning of fine details. This work introduces SemFaceEdit, a novel method that streamlines the appearance and geometric editing process by generating semantic fields on generative radiance manifolds. Utilizing latent codes, our method effectively disentangles the geometry and appearance associated with different facial semantics within the generated image. In contrast to existing methods that can change the appearance of the entire radiance field, our method enables the precise editing of particular facial semantics while preserving the integrity of other regions. Our network comprises two key modules: the Geometry module, which generates semantic radiance and occupancy fields, and the Appearance module, which is responsible for predicting RGB radiance. We jointly train both modules in adversarial settings to learn semantic-aware geometry and appearance descriptors. The appearance descriptors are then conditioned on their respective semantic latent codes by the Appearance Module, facilitating disentanglement and enhanced control. Our experiments highlight SemFaceEdit's superior performance in semantic field-based editing, particularly in achieving improved radiance field disentanglement. △ Less

Submitted 28 June, 2025; originally announced June 2025.

arXiv:2506.18172 [pdf, ps, other]

STACT-Time: Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification

Authors: Irsyad Adam, Tengyue Zhang, Shrayes Raman, Zhuyu Qiu, Brandon Taraku, Hexiang Feng, Sile Wang, Ashwath Radhachandran, Shreeram Athreya, Vedrana Ivezic, Peipei Ping, Corey Arnold, William Speier

Abstract: Thyroid cancer is among the most common cancers in the United States. Thyroid nodules are frequently detected through ultrasound (US) imaging, and some require further evaluation via fine-needle aspiration (FNA) biopsy. Despite its effectiveness, FNA often leads to unnecessary biopsies of benign nodules, causing patient discomfort and anxiety. To address this, the American College of Radiology Thy… ▽ More Thyroid cancer is among the most common cancers in the United States. Thyroid nodules are frequently detected through ultrasound (US) imaging, and some require further evaluation via fine-needle aspiration (FNA) biopsy. Despite its effectiveness, FNA often leads to unnecessary biopsies of benign nodules, causing patient discomfort and anxiety. To address this, the American College of Radiology Thyroid Imaging Reporting and Data System (TI-RADS) has been developed to reduce benign biopsies. However, such systems are limited by interobserver variability. Recent deep learning approaches have sought to improve risk stratification, but they often fail to utilize the rich temporal and spatial context provided by US cine clips, which contain dynamic global information and surrounding structural changes across various views. In this work, we propose the Spatio-Temporal Cross Attention for Cine Thyroid Ultrasound Time Series Classification (STACT-Time) model, a novel representation learning framework that integrates imaging features from US cine clips with features from segmentation masks automatically generated by a pretrained model. By leveraging self-attention and cross-attention mechanisms, our model captures the rich temporal and spatial context of US cine clips while enhancing feature representation through segmentation-guided learning. Our model improves malignancy prediction compared to state-of-the-art models, achieving a cross-validation precision of 0.91 (plus or minus 0.02) and an F1 score of 0.89 (plus or minus 0.02). By reducing unnecessary biopsies of benign nodules while maintaining high sensitivity for malignancy detection, our model has the potential to enhance clinical decision-making and improve patient outcomes. △ Less

Submitted 22 June, 2025; originally announced June 2025.

arXiv:2505.21385 [pdf, ps, other]

Dynamic Vision from EEG Brain Recordings, How much does EEG know?

Authors: Prajwal Singh, Anupam Sharma, Pankaj Pandey, Krishna Miyapuram, Shanmuganathan Raman

Abstract: Reconstructing dynamic visual stimuli from brain EEG recordings is challenging due to the non-stationary and noisy nature of EEG signals and the limited availability of EEG-video datasets. Prior work has largely focused on static image reconstruction, leaving the open question of whether EEG carries sufficient information for dynamic video decoding. In this work, we present EEGVid, a framework tha… ▽ More Reconstructing dynamic visual stimuli from brain EEG recordings is challenging due to the non-stationary and noisy nature of EEG signals and the limited availability of EEG-video datasets. Prior work has largely focused on static image reconstruction, leaving the open question of whether EEG carries sufficient information for dynamic video decoding. In this work, we present EEGVid, a framework that reconstructs dynamic video stimuli from EEG signals while systematically probing the information they encode. Our approach first learns the EEG representation and then uses these features for video synthesis with a temporally conditioned StyleGAN-ADA that maps EEG embeddings to specific frame positions. Through experiments on three datasets (SEED, EEG-Video Action, SEED-DV), we demonstrate that EEG supports semantically meaningful reconstruction of dynamic visual content, and we quantify \emph{how much EEG knows}: (i) hemispheric asymmetry, with the left hemisphere more predictive of visual content and the right hemisphere of emotional content, (ii) the temporal lobe as the most informative region, and (iii) EEG timesteps 100--300 as the most critical for dynamic visual encoding. Importantly, while generative priors contribute fine spatial detail, EEG provides the semantic and temporal guidance necessary for reconstructing videos that align with the observed stimuli. This positions video generation not as a standalone generative benchmark, but as a means to visualize and validate the representational content of EEG in the context of dynamic vision. △ Less

Submitted 22 September, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

arXiv:2505.21252 [pdf, ps, other]

doi 10.2312/pg.20231279

Hand Shadow Art: A Differentiable Rendering Perspective

Authors: Aalok Gangopadhyay, Prajwal Singh, Ashish Tiwari, Shanmuganathan Raman

Abstract: Shadow art is an exciting form of sculptural art that produces captivating artistic effects through the 2D shadows cast by 3D shapes. Hand shadows, also known as shadow puppetry or shadowgraphy, involve creating various shapes and figures using your hands and fingers to cast meaningful shadows on a wall. In this work, we propose a differentiable rendering-based approach to deform hand models such… ▽ More Shadow art is an exciting form of sculptural art that produces captivating artistic effects through the 2D shadows cast by 3D shapes. Hand shadows, also known as shadow puppetry or shadowgraphy, involve creating various shapes and figures using your hands and fingers to cast meaningful shadows on a wall. In this work, we propose a differentiable rendering-based approach to deform hand models such that they cast a shadow consistent with a desired target image and the associated lighting configuration. We showcase the results of shadows cast by a pair of two hands and the interpolation of hand poses between two desired shadow images. We believe that this work will be a useful tool for the graphics community. △ Less

Submitted 27 May, 2025; originally announced May 2025.

Comments: Published in Pacific Graphics 2023

arXiv:2505.14892 [pdf, ps, other]

Scaling Laws for State Dynamics in Large Language Models

Authors: Jacob X Li, Shreyas S Raman, Jessica Wan, Fahad Samman, Jazlyn Lin

Abstract: Large Language Models (LLMs) are increasingly used in tasks requiring internal state tracking, yet their ability to model state transition dynamics remains poorly understood. We evaluate how well LLMs capture deterministic state dynamics across 3 domains: Box Tracking, Abstract DFA Sequences, and Complex Text Games, each formalizable as a finite-state system. Across tasks, we find that next-state… ▽ More Large Language Models (LLMs) are increasingly used in tasks requiring internal state tracking, yet their ability to model state transition dynamics remains poorly understood. We evaluate how well LLMs capture deterministic state dynamics across 3 domains: Box Tracking, Abstract DFA Sequences, and Complex Text Games, each formalizable as a finite-state system. Across tasks, we find that next-state prediction accuracy degrades with increasing state-space size and sparse transitions. GPT-2 XL reaches about 70% accuracy in low-complexity settings but drops below 30% when the number of boxes or states exceeds 5 or 10, respectively. In DFA tasks, Pythia-1B fails to exceed 50% accuracy when the number of states is > 10 and transitions are < 30. Through activation patching, we identify attention heads responsible for propagating state information: GPT-2 XL Layer 22 Head 20, and Pythia-1B Heads at Layers 10, 11, 12, and 14. While these heads successfully move relevant state features, action information is not reliably routed to the final token, indicating weak joint state-action reasoning. Our results suggest that state tracking in LLMs emerges from distributed interactions of next-token heads rather than explicit symbolic computation. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: 16 pages; 23 figures

ACM Class: I.2.7; I.2.1; I.2.4; I.5.4

arXiv:2504.02465 [pdf, other]

RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects

Authors: Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar, Shanmuganathan Raman

Abstract: Recent advancements in learning-based methods have opened new avenues for exploring and interpreting art forms, such as shadow art, origami, and sketch art, through computational models. One notable visual art form is 3D Anamorphic Art in which an ensemble of arbitrarily shaped 3D objects creates a realistic and meaningful expression when observed from a particular viewpoint and loses its coherenc… ▽ More Recent advancements in learning-based methods have opened new avenues for exploring and interpreting art forms, such as shadow art, origami, and sketch art, through computational models. One notable visual art form is 3D Anamorphic Art in which an ensemble of arbitrarily shaped 3D objects creates a realistic and meaningful expression when observed from a particular viewpoint and loses its coherence over the other viewpoints. In this work, we build on insights from 3D Anamorphic Art to perform 3D object arrangement. We introduce RASP, a differentiable-rendering-based framework to arrange arbitrarily shaped 3D objects within a bounded volume via shadow (or silhouette)-guided optimization with an aim of minimal inter-object spacing and near-maximal occupancy. Furthermore, we propose a novel SDF-based formulation to handle inter-object intersection and container extrusion. We demonstrate that RASP can be extended to part assembly alongside object packing considering 3D objects to be "parts" of another 3D object. Finally, we present artistic illustrations of multi-view anamorphic art, achieving meaningful expressions from multiple viewpoints within a single ensemble. △ Less

Submitted 3 April, 2025; originally announced April 2025.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2025

arXiv:2503.13344 [pdf, other]

STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans

Authors: Shashikant Verma, Harish Katti, Soumyaratna Debnath, Yamuna Swamy, Shanmuganathan Raman

Abstract: We introduce STEP, a novel framework utilizing Transformer-based discriminative model prediction for simultaneous tracking and estimation of pose across diverse animal species and humans. We are inspired by the fact that the human brain exploits spatiotemporal continuity and performs concurrent localization and pose estimation despite the specialization of brain areas for form and motion processin… ▽ More We introduce STEP, a novel framework utilizing Transformer-based discriminative model prediction for simultaneous tracking and estimation of pose across diverse animal species and humans. We are inspired by the fact that the human brain exploits spatiotemporal continuity and performs concurrent localization and pose estimation despite the specialization of brain areas for form and motion processing. Traditional discriminative models typically require predefined target states for determining model weights, a challenge we address through Gaussian Map Soft Prediction (GMSP) and Offset Map Regression Adapter (OMRA) Modules. These modules remove the necessity of keypoint target states as input, streamlining the process. Our method starts with a known target state in the initial frame of a given video sequence. It then seamlessly tracks the target and estimates keypoints of anatomical importance as output for subsequent frames. Unlike prevalent top-down pose estimation methods, our approach doesn't rely on per-frame target detections due to its tracking capability. This facilitates a significant advancement in inference efficiency and potential applications. We train and validate our approach on datasets encompassing diverse species. Our experiments demonstrate superior results compared to existing methods, opening doors to various applications, including but not limited to action recognition and behavioral analysis. △ Less

Submitted 20 March, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

arXiv:2502.17524 [pdf, other]

Multimodal Bearing Fault Classification Under Variable Conditions: A 1D CNN with Transfer Learning

Authors: Tasfiq E. Alam, Md Manjurul Ahsan, Shivakumar Raman

Abstract: Bearings play an integral role in ensuring the reliability and efficiency of rotating machinery - reducing friction and handling critical loads. Bearing failures that constitute up to 90% of mechanical faults highlight the imperative need for reliable condition monitoring and fault detection. This study proposes a multimodal bearing fault classification approach that relies on vibration and motor… ▽ More Bearings play an integral role in ensuring the reliability and efficiency of rotating machinery - reducing friction and handling critical loads. Bearing failures that constitute up to 90% of mechanical faults highlight the imperative need for reliable condition monitoring and fault detection. This study proposes a multimodal bearing fault classification approach that relies on vibration and motor phase current signals within a one-dimensional convolutional neural network (1D CNN) framework. The method fuses features from multiple signals to enhance the accuracy of fault detection. Under the baseline condition (1,500 rpm, 0.7 Nm load torque, and 1,000 N radial force), the model reaches an accuracy of 96% with addition of L2 regularization. This represents a notable improvement of 2% compared to the non-regularized model. In addition, the model demonstrates robust performance across three distinct operating conditions by employing transfer learning (TL) strategies. Among the tested TL variants, the approach that preserves parameters up to the first max-pool layer and then adjusts subsequent layers achieves the highest performance. While this approach attains excellent accuracy across varied conditions, it requires more computational time due to its greater number of trainable parameters. To address resource constraints, less computationally intensive models offer feasible trade-offs, albeit at a slight accuracy cost. Overall, this multimodal 1D CNN framework with late fusion and TL strategies lays a foundation for more accurate, adaptable, and efficient bearing fault classification in industrial environments with variable operating conditions. △ Less

Submitted 23 February, 2025; originally announced February 2025.

arXiv:2501.01174 [pdf, other]

L3D-Pose: Lifting Pose for 3D Avatars from a Single Camera in the Wild

Authors: Soumyaratna Debnath, Harish Katti, Shashikant Verma, Shanmuganathan Raman

Abstract: While 2D pose estimation has advanced our ability to interpret body movements in animals and primates, it is limited by the lack of depth information, constraining its application range. 3D pose estimation provides a more comprehensive solution by incorporating spatial depth, yet creating extensive 3D pose datasets for animals is challenging due to their dynamic and unpredictable behaviours in nat… ▽ More While 2D pose estimation has advanced our ability to interpret body movements in animals and primates, it is limited by the lack of depth information, constraining its application range. 3D pose estimation provides a more comprehensive solution by incorporating spatial depth, yet creating extensive 3D pose datasets for animals is challenging due to their dynamic and unpredictable behaviours in natural settings. To address this, we propose a hybrid approach that utilizes rigged avatars and the pipeline to generate synthetic datasets to acquire the necessary 3D annotations for training. Our method introduces a simple attention-based MLP network for converting 2D poses to 3D, designed to be independent of the input image to ensure scalability for poses in natural environments. Additionally, we identify that existing anatomical keypoint detectors are insufficient for accurate pose retargeting onto arbitrary avatars. To overcome this, we present a lookup table based on a deep pose estimation method using a synthetic collection of diverse actions rigged avatars perform. Our experiments demonstrate the effectiveness and efficiency of this lookup table-based retargeting approach. Overall, we propose a comprehensive framework with systematically synthesized datasets for lifting poses from 2D to 3D and then utilize this to re-target motion from wild settings onto arbitrary avatars. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Comments: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

arXiv:2412.16942 [pdf, other]

BloomCoreset: Fast Coreset Sampling using Bloom Filters for Fine-Grained Self-Supervised Learning

Authors: Prajwal Singh, Gautam Vashishtha, Indra Deep Mastan, Shanmuganathan Raman

Abstract: The success of deep learning in supervised fine-grained recognition for domain-specific tasks relies heavily on expert annotations. The Open-Set for fine-grained Self-Supervised Learning (SSL) problem aims to enhance performance on downstream tasks by strategically sampling a subset of images (the Core-Set) from a large pool of unlabeled data (the Open-Set). In this paper, we propose a novel metho… ▽ More The success of deep learning in supervised fine-grained recognition for domain-specific tasks relies heavily on expert annotations. The Open-Set for fine-grained Self-Supervised Learning (SSL) problem aims to enhance performance on downstream tasks by strategically sampling a subset of images (the Core-Set) from a large pool of unlabeled data (the Open-Set). In this paper, we propose a novel method, BloomCoreset, that significantly reduces sampling time from Open-Set while preserving the quality of samples in the coreset. To achieve this, we utilize Bloom filters as an innovative hashing mechanism to store both low- and high-level features of the fine-grained dataset, as captured by Open-CLIP, in a space-efficient manner that enables rapid retrieval of the coreset from the Open-Set. To show the effectiveness of the sampled coreset, we integrate the proposed method into the state-of-the-art fine-grained SSL framework, SimCore [1]. The proposed algorithm drastically outperforms the sampling strategy of the baseline in SimCore [1] with a $98.5\%$ reduction in sampling time with a mere $0.83\%$ average trade-off in accuracy calculated across $11$ downstream datasets. △ Less

Submitted 22 December, 2024; originally announced December 2024.

Comments: Accepted at ICASSP 2025

arXiv:2412.16860 [pdf, other]

Diffusion-Based Approaches in Medical Image Generation and Analysis

Authors: Abdullah al Nomaan Nafi, Md. Alamgir Hossain, Rakib Hossain Rifat, Md Mahabub Uz Zaman, Md Manjurul Ahsan, Shivakumar Raman

Abstract: Data scarcity in medical imaging poses significant challenges due to privacy concerns. Diffusion models, a recent generative modeling technique, offer a potential solution by generating synthetic and realistic data. However, questions remain about the performance of convolutional neural network (CNN) models on original and synthetic datasets. If diffusion-generated samples can help CNN models perf… ▽ More Data scarcity in medical imaging poses significant challenges due to privacy concerns. Diffusion models, a recent generative modeling technique, offer a potential solution by generating synthetic and realistic data. However, questions remain about the performance of convolutional neural network (CNN) models on original and synthetic datasets. If diffusion-generated samples can help CNN models perform comparably to those trained on original datasets, reliance on patient-specific data for training CNNs might be reduced. In this study, we investigated the effectiveness of diffusion models for generating synthetic medical images to train CNNs in three domains: Brain Tumor MRI, Acute Lymphoblastic Leukemia (ALL), and SARS-CoV-2 CT scans. A diffusion model was trained to generate synthetic datasets for each domain. Pre-trained CNN architectures were then trained on these synthetic datasets and evaluated on unseen real data. All three datasets achieved promising classification performance using CNNs trained on synthetic data. Local Interpretable Model-Agnostic Explanations (LIME) analysis revealed that the models focused on relevant image features for classification. This study demonstrates the potential of diffusion models to generate synthetic medical images for training CNNs in medical image analysis. △ Less

Submitted 22 December, 2024; originally announced December 2024.

arXiv:2412.05313 [pdf, ps, other]

λ: A Benchmark for Data-Efficiency in Long-Horizon Indoor Mobile Manipulation Robotics

Authors: Ahmed Jaafar, Shreyas Sundara Raman, Sudarshan Harithas, Yichen Wei, Sofia Juliani, Anneke Wernerfelt, Benedict Quartey, Ifrah Idrees, Jason Xinyu Liu, Stefanie Tellex

Abstract: Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA (λ) benchmark-Long-horizon Actions for Mobile-manipul… ▽ More Learning to execute long-horizon mobile manipulation tasks is crucial for advancing robotics in household and workplace settings. However, current approaches are typically data-inefficient, underscoring the need for improved models that require realistically sized benchmarks to evaluate their efficiency. To address this, we introduce the LAMBDA (λ) benchmark-Long-horizon Actions for Mobile-manipulation Benchmarking of Directed Activities-which evaluates the data efficiency of models on language-conditioned, long-horizon, multi-room, multi-floor, pick-and-place tasks using a dataset of manageable size, more feasible for collection. Our benchmark includes 571 human-collected demonstrations that provide realism and diversity in simulated and real-world settings. Unlike planner-generated data, these trajectories offer natural variability and replay-verifiability, ensuring robust learning and evaluation. We leverage λ to benchmark current end-to-end learning methods and a modular neuro-symbolic approach that combines foundation models with task and motion planning. We find that learning methods, even when pretrained, yield lower success rates, while a neuro-symbolic method performs significantly better and requires less data. △ Less

Submitted 1 August, 2025; v1 submitted 28 November, 2024; originally announced December 2024.

Comments: Accepted to IROS 2025. Sudarshan Harithas and Yichen Wei contributed equally. 8 pages. 7 figures

arXiv:2411.19903 [pdf, ps, other]

Incremental Multi-Scene Modeling via Continual Neural Graphics Primitives

Authors: Prajwal Singh, Ashish Tiwari, Gautam Vashishtha, Shanmuganathan Raman

Abstract: Neural radiance fields (NeRF) have revolutionized photorealistic rendering of novel views for 3D scenes. Despite their growing popularity and efficiency as 3D resources, NeRFs face scalability challenges due to the need for separate models per scene and the cumulative increase in training time for multiple scenes. The potential for incrementally encoding multiple 3D scenes into a single NeRF model… ▽ More Neural radiance fields (NeRF) have revolutionized photorealistic rendering of novel views for 3D scenes. Despite their growing popularity and efficiency as 3D resources, NeRFs face scalability challenges due to the need for separate models per scene and the cumulative increase in training time for multiple scenes. The potential for incrementally encoding multiple 3D scenes into a single NeRF model remains largely unexplored. To address this, we introduce Continual-Neural Graphics Primitives (C-NGP), a novel continual learning framework that integrates multiple scenes incrementally into a single neural radiance field. Using a generative replay approach, C-NGP adapts to new scenes without requiring access to old data. We demonstrate that C-NGP can accommodate multiple scenes without increasing the parameter count, producing high-quality novel-view renderings on synthetic and real datasets. Notably, C-NGP models all $8$ scenes from the Real-LLFF dataset together, with only a $2.2\%$ drop in PSNR compared to vanilla NeRF, which models each scene independently. Further, C-NGP allows multiple style edits in the same network. △ Less

Submitted 26 August, 2025; v1 submitted 29 November, 2024; originally announced November 2024.

arXiv:2411.08673 [pdf, other]

doi 10.1145/3680530.3695448

ScribGen: Generating Scribble Art Through Metaheuristics

Authors: Soumyaratna Debnath, Ashish Tiwari, Shanmuganathan Raman

Abstract: Art has long been a medium for individuals to engage with the world. Scribble art, a form of abstract visual expression, features spontaneous, gestural strokes made with pens or brushes. These dynamic and expressive compositions, created quickly and impulsively, reveal intricate patterns and hidden meanings upon closer inspection. While scribble art is often associated with spontaneous expression… ▽ More Art has long been a medium for individuals to engage with the world. Scribble art, a form of abstract visual expression, features spontaneous, gestural strokes made with pens or brushes. These dynamic and expressive compositions, created quickly and impulsively, reveal intricate patterns and hidden meanings upon closer inspection. While scribble art is often associated with spontaneous expression and experimentation, it can also be planned and intentional. Some artists use scribble techniques as a starting point for their creative process, exploring the possibilities of line, shape, and texture before refining their work into more polished compositions. From ancient cave paintings to modern abstract sketches and doodles, scribble art has evolved with civilizations, reflecting diverse artistic movements and cultural influences. This evolution highlights its universal appeal, transcending language and cultural barriers and connecting people through the shared experience of creating art. △ Less

Submitted 13 November, 2024; originally announced November 2024.

Comments: SIGGRAPH Asia 2024

arXiv:2411.05286 [pdf, other]

Metrology and Manufacturing-Integrated Digital Twin (MM-DT) for Advanced Manufacturing: Insights from CMM and FARO Arm Measurements

Authors: Hamidreza Samadi, Md Manjurul Ahsan, Shivakumar Raman

Abstract: Metrology, the science of measurement, plays a key role in Advanced Manufacturing (AM) to ensure quality control, process optimization, and predictive maintenance. However, it has often been overlooked in AM domains due to the current focus on automation and the complexity of integrated precise measurement systems. Over the years, Digital Twin (DT) technology in AM has gained much attention due to… ▽ More Metrology, the science of measurement, plays a key role in Advanced Manufacturing (AM) to ensure quality control, process optimization, and predictive maintenance. However, it has often been overlooked in AM domains due to the current focus on automation and the complexity of integrated precise measurement systems. Over the years, Digital Twin (DT) technology in AM has gained much attention due to its potential to address these challenges through physical data integration and real-time monitoring, though its use in metrology remains limited. Taking this into account, this study proposes a novel framework, the Metrology and Manufacturing-Integrated Digital Twin (MM-DT), which focuses on data from two metrology tools, collected from Coordinate Measuring Machines (CMM) and FARO Arm devices. Throughout this process, we measured 20 manufacturing parts, with each part assessed twice under different temperature conditions. Using Ensemble Machine Learning methods, our proposed approach predicts measurement deviations accurately, achieving an R2 score of 0.91 and reducing the Root Mean Square Error (RMSE) to 1.59 micrometers. Our MM-DT framework demonstrates its efficiency by improving metrology processes and offers valuable insights for researchers and practitioners who aim to increase manufacturing precision and quality. △ Less

Submitted 7 November, 2024; originally announced November 2024.

arXiv:2411.01299 [pdf, other]

PMI-DT: Leveraging Digital Twins and Machine Learning for Predictive Modeling and Inspection in Manufacturing

Authors: Chas Hamel, Md Manjurul Ahsan, Shivakumar Raman

Abstract: Over the years, Digital Twin (DT) has become popular in Advanced Manufacturing (AM) due to its ability to improve production efficiency and quality. By creating virtual replicas of physical assets, DTs help in real-time monitoring, develop predictive models, and improve operational performance. However, integrating data from physical systems into reliable predictive models, particularly in precisi… ▽ More Over the years, Digital Twin (DT) has become popular in Advanced Manufacturing (AM) due to its ability to improve production efficiency and quality. By creating virtual replicas of physical assets, DTs help in real-time monitoring, develop predictive models, and improve operational performance. However, integrating data from physical systems into reliable predictive models, particularly in precision measurement and failure prevention, is often challenging and less explored. This study introduces a Predictive Maintenance and Inspection Digital Twin (PMI-DT) framework with a focus on precision measurement and predictive quality assurance using 3D-printed 1''-4 ACME bolt, CyberGage 360 vision inspection system, SolidWorks, and Microsoft Azure. During this approach, dimensional inspection data is combined with fatigue test results to create a model for detecting failures. Using Machine Learning (ML) -- Random Forest and Decision Tree models -- the proposed approaches were able to predict bolt failure with real-time data 100% accurately. Our preliminary result shows Max Position (30%) and Max Load (24%) are the main factors that contribute to that failure. We expect the PMI-DT framework will reduce inspection time and improve predictive maintenance, ultimately giving manufacturers a practical way to boost product quality and reliability using DT in AM. △ Less

Submitted 2 November, 2024; originally announced November 2024.

arXiv:2409.02716 [pdf, other]

LIPIDS: Learning-based Illumination Planning In Discretized (Light) Space for Photometric Stereo

Authors: Ashish Tiwari, Mihir Sutariya, Shanmuganathan Raman

Abstract: Photometric stereo is a powerful method for obtaining per-pixel surface normals from differently illuminated images of an object. While several methods address photometric stereo with different image (or light) counts ranging from one to two to a hundred, very few focus on learning optimal lighting configuration. Finding an optimal configuration is challenging due to the vast number of possible li… ▽ More Photometric stereo is a powerful method for obtaining per-pixel surface normals from differently illuminated images of an object. While several methods address photometric stereo with different image (or light) counts ranging from one to two to a hundred, very few focus on learning optimal lighting configuration. Finding an optimal configuration is challenging due to the vast number of possible lighting directions. Moreover, exhaustively sampling all possibilities is impractical due to time and resource constraints. Photometric stereo methods have demonstrated promising performance on existing datasets, which feature limited light directions sparsely sampled from the light space. Therefore, can we optimally utilize these datasets for illumination planning? In this work, we introduce LIPIDS - Learning-based Illumination Planning In Discretized light Space to achieve minimal and optimal lighting configurations for photometric stereo under arbitrary light distribution. We propose a Light Sampling Network (LSNet) that optimizes lighting direction for a fixed number of lights by minimizing the normal loss through a normal regression network. The learned light configurations can directly estimate surface normals during inference, even using an off-the-shelf photometric stereo method. Extensive qualitative and quantitative analyses on synthetic and real-world datasets show that photometric stereo under learned lighting configurations through LIPIDS either surpasses or is nearly comparable to existing illumination planning methods across different photometric stereo backbones. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: Accepted in WACV 2025

arXiv:2409.00877 [pdf, other]

Digital Twins in Additive Manufacturing: A Systematic Review

Authors: Md Manjurul Ahsan, Yingtao Liu, Shivakumar Raman, Zahed Siddique

Abstract: Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. How… ▽ More Digital Twins (DTs) are becoming popular in Additive Manufacturing (AM) due to their ability to create virtual replicas of physical components of AM machines, which helps in real-time production monitoring. Advanced techniques such as Machine Learning (ML), Augmented Reality (AR), and simulation-based models play key roles in developing intelligent and adaptable DTs in manufacturing processes. However, questions remain regarding scalability, the integration of high-quality data, and the computational power required for real-time applications in developing DTs. Understanding the current state of DTs in AM is essential to address these challenges and fully utilize their potential in advancing AM processes. Considering this opportunity, this work aims to provide a comprehensive overview of DTs in AM by addressing the following four research questions: (1) What are the key types of DTs used in AM and their specific applications? (2) What are the recent developments and implementations of DTs? (3) How are DTs employed in process improvement and hybrid manufacturing? (4) How are DTs integrated with Industry 4.0 technologies? By discussing current applications and techniques, we aim to offer a better understanding and potential future research directions for researchers and practitioners in AM and DTs. △ Less

Submitted 1 November, 2024; v1 submitted 1 September, 2024; originally announced September 2024.

arXiv:2409.00674 [pdf, other]

MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo

Authors: Ashish Tiwari, Satoshi Ikehata, Shanmuganathan Raman

Abstract: Photometric stereo typically demands intricate data acquisition setups involving multiple light sources to recover surface normals accurately. In this paper, we propose MERLiN, an attention-based hourglass network that integrates single image-based inverse rendering and relighting within a single unified framework. We evaluate the performance of photometric stereo methods using these relit images… ▽ More Photometric stereo typically demands intricate data acquisition setups involving multiple light sources to recover surface normals accurately. In this paper, we propose MERLiN, an attention-based hourglass network that integrates single image-based inverse rendering and relighting within a single unified framework. We evaluate the performance of photometric stereo methods using these relit images and demonstrate how they can circumvent the underlying challenge of complex data acquisition. Our physically-based model is trained on a large synthetic dataset containing complex shapes with spatially varying BRDF and is designed to handle indirect illumination effects to improve material reconstruction and relighting. Through extensive qualitative and quantitative evaluation, we demonstrate that the proposed framework generalizes well to real-world images, achieving high-quality shape, material estimation, and relighting. We assess these synthetically relit images over photometric stereo benchmark methods for their physical correctness and resulting normal estimation accuracy, paving the way towards single-shot photometric stereo through physically-based relighting. This work allows us to address the single image-based inverse rendering problem holistically, applying well to both synthetic and real data and taking a step towards mitigating the challenge of data acquisition in photometric stereo. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: Accepted in ECCV 2024

arXiv:2408.10207 [pdf, other]

A Comprehensive Survey on Diffusion Models and Their Applications

Authors: Md Manjurul Ahsan, Shivakumar Raman, Yingtao Liu, Zahed Siddique

Abstract: Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech synthesis, and natural language processing due to their ability to produce high-quality samples. As Diffusion Models are being adopted in various domains, existing… ▽ More Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech synthesis, and natural language processing due to their ability to produce high-quality samples. As Diffusion Models are being adopted in various domains, existing literature reviews that often focus on specific areas like computer vision or medical imaging may not serve a broader audience across multiple fields. Therefore, this review presents a comprehensive overview of Diffusion Models, covering their theoretical foundations and algorithmic innovations. We highlight their applications in diverse areas such as media quality, authenticity, synthesis, image transformation, healthcare, and more. By consolidating current knowledge and identifying emerging trends, this review aims to facilitate a deeper understanding and broader adoption of Diffusion Models and provide guidelines for future researchers and practitioners across diverse disciplines. △ Less

Submitted 1 July, 2024; originally announced August 2024.

arXiv:2408.04805 [pdf]

Improved Robustness for Deep Learning-based Segmentation of Multi-Center Myocardial Perfusion MRI Datasets Using Data Adaptive Uncertainty-guided Space-time Analysis

Authors: Dilek M. Yalcinkaya, Khalid Youssef, Bobak Heydari, Janet Wei, Noel Bairey Merz, Robert Judd, Rohan Dharmakumar, Orlando P. Simonetti, Jonathan W. Weinsaft, Subha V. Raman, Behzad Sharif

Abstract: Background. Fully automatic analysis of myocardial perfusion MRI datasets enables rapid and objective reporting of stress/rest studies in patients with suspected ischemic heart disease. Developing deep learning techniques that can analyze multi-center datasets despite limited training data and variations in software and hardware is an ongoing challenge. Methods. Datasets from 3 medical centers a… ▽ More Background. Fully automatic analysis of myocardial perfusion MRI datasets enables rapid and objective reporting of stress/rest studies in patients with suspected ischemic heart disease. Developing deep learning techniques that can analyze multi-center datasets despite limited training data and variations in software and hardware is an ongoing challenge. Methods. Datasets from 3 medical centers acquired at 3T (n = 150 subjects) were included: an internal dataset (inD; n = 95) and two external datasets (exDs; n = 55) used for evaluating the robustness of the trained deep neural network (DNN) models against differences in pulse sequence (exD-1) and scanner vendor (exD-2). A subset of inD (n = 85) was used for training/validation of a pool of DNNs for segmentation, all using the same spatiotemporal U-Net architecture and hyperparameters but with different parameter initializations. We employed a space-time sliding-patch analysis approach that automatically yields a pixel-wise "uncertainty map" as a byproduct of the segmentation process. In our approach, a given test case is segmented by all members of the DNN pool and the resulting uncertainty maps are leveraged to automatically select the "best" one among the pool of solutions. Results. The proposed DAUGS analysis approach performed similarly to the established approach on the internal dataset (p = n.s.) whereas it significantly outperformed on the external datasets (p < 0.005 for exD-1 and exD-2). Moreover, the number of image series with "failed" segmentation was significantly lower for the proposed vs. the established approach (4.3% vs. 17.1%, p < 0.0005). Conclusions. The proposed DAUGS analysis approach has the potential to improve the robustness of deep learning methods for segmentation of multi-center stress perfusion datasets with variations in the choice of pulse sequence, site location or scanner vendor. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Accepted for publication in JCMR, 2024

arXiv:2407.09294 [pdf, other]

SS-SfP:Neural Inverse Rendering for Self Supervised Shape from (Mixed) Polarization

Authors: Ashish Tiwari, Shanmuganathan Raman

Abstract: We present a novel inverse rendering-based framework to estimate the 3D shape (per-pixel surface normals and depth) of objects and scenes from single-view polarization images, the problem popularly known as Shape from Polarization (SfP). The existing physics-based and learning-based methods for SfP perform under certain restrictions, i.e., (a) purely diffuse or purely specular reflections, which a… ▽ More We present a novel inverse rendering-based framework to estimate the 3D shape (per-pixel surface normals and depth) of objects and scenes from single-view polarization images, the problem popularly known as Shape from Polarization (SfP). The existing physics-based and learning-based methods for SfP perform under certain restrictions, i.e., (a) purely diffuse or purely specular reflections, which are seldom in the real surfaces, (b) availability of the ground truth surface normals for direct supervision that are hard to acquire and are limited by the scanner's resolution, and (c) known refractive index. To overcome these restrictions, we start by learning to separate the partially-polarized diffuse and specular reflection components, which we call reflectance cues, based on a modified polarization reflection model and then estimate shape under mixed polarization through an inverse-rendering based self-supervised deep learning framework called SS-SfP, guided by the polarization data and estimated reflectance cues. Furthermore, we also obtain the refractive index as a non-linear least squares solution. Through extensive quantitative and qualitative evaluation, we establish the efficacy of the proposed framework over simple single-object scenes from DeepSfP dataset and complex in-the-wild scenes from SPW dataset in an entirely self-supervised setting. To the best of our knowledge, this is the first learning-based approach to address SfP under mixed polarization in a completely self-supervised framework. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Published in Pacific Graphics 2023

arXiv:2405.13832 [pdf, other]

Federated Learning in Healthcare: Model Misconducts, Security, Challenges, Applications, and Future Research Directions -- A Systematic Review

Authors: Md Shahin Ali, Md Manjurul Ahsan, Lamia Tasnim, Sadia Afrin, Koushik Biswas, Md Maruf Hossain, Md Mahfuz Ahmed, Ronok Hashan, Md Khairul Islam, Shivakumar Raman

Abstract: Data privacy has become a major concern in healthcare due to the increasing digitization of medical records and data-driven medical research. Protecting sensitive patient information from breaches and unauthorized access is critical, as such incidents can have severe legal and ethical complications. Federated Learning (FL) addresses this concern by enabling multiple healthcare institutions to coll… ▽ More Data privacy has become a major concern in healthcare due to the increasing digitization of medical records and data-driven medical research. Protecting sensitive patient information from breaches and unauthorized access is critical, as such incidents can have severe legal and ethical complications. Federated Learning (FL) addresses this concern by enabling multiple healthcare institutions to collaboratively learn from decentralized data without sharing it. FL's scope in healthcare covers areas such as disease prediction, treatment customization, and clinical trial research. However, implementing FL poses challenges, including model convergence in non-IID (independent and identically distributed) data environments, communication overhead, and managing multi-institutional collaborations. A systematic review of FL in healthcare is necessary to evaluate how effectively FL can provide privacy while maintaining the integrity and usability of medical data analysis. In this study, we analyze existing literature on FL applications in healthcare. We explore the current state of model security practices, identify prevalent challenges, and discuss practical applications and their implications. Additionally, the review highlights promising future research directions to refine FL implementations, enhance data security protocols, and expand FL's use to broader healthcare applications, which will benefit future researchers and practitioners. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.10925 [pdf]

High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariates

Authors: Janick Weberpals, Pamela A. Shaw, Kueiyu Joshua Lin, Richard Wyss, Joseph M Plasek, Li Zhou, Kerry Ngan, Thomas DeRamus, Sudha R. Raman, Bradley G. Hammill, Hana Lee, Sengwee Toh, John G. Connolly, Kimberly J. Dandreo, Fang Tian, Wei Liu, Jie Li, José J. Hernández-Muñoz, Sebastian Schneeweiss, Rishi J. Desai

Abstract: Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from… ▽ More Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2312.06317 [pdf, ps, other]

Flow Symmetrization for Parameterized Constrained Diffeomorphisms

Authors: Aalok Gangopadhyay, Dwip Dalal, Progyan Das, Shanmuganathan Raman

Abstract: Diffeomorphisms play a crucial role while searching for shapes with fixed topological properties, allowing for smooth deformation of template shapes. Several approaches use diffeomorphism for shape search. However, these approaches employ only unconstrained diffeomorphisms. In this work, we develop Flow Symmetrization - a method to represent a parametric family of constrained diffeomorphisms that… ▽ More Diffeomorphisms play a crucial role while searching for shapes with fixed topological properties, allowing for smooth deformation of template shapes. Several approaches use diffeomorphism for shape search. However, these approaches employ only unconstrained diffeomorphisms. In this work, we develop Flow Symmetrization - a method to represent a parametric family of constrained diffeomorphisms that contain additional symmetry constraints such as periodicity, rotation equivariance, and transflection equivariance. Our representation is differentiable in nature, making it suitable for gradient-based optimization approaches for shape search. As these symmetry constraints naturally arise in tiling classes, our method is ideal for representing tile shapes belonging to any tiling class. To demonstrate the efficacy of our method, we design two frameworks for addressing the challenging problems of Escherization and Density Estimation. The first framework is dedicated to the Escherization problem, where we parameterize tile shapes belonging to different isohedral classes. Given a target shape, the template tile is deformed using gradient-based optimization to resemble the target shape. The second framework focuses on density estimation in identification spaces. By leveraging the inherent link between tiling theory and identification topology, we design constrained diffeomorphisms for the plane that result in unconstrained diffeomorphisms on the identification spaces. Specifically, we perform density estimation on identification spaces such as torus, sphere, Klein bottle, and projective plane. Through results and experiments, we demonstrate that our method obtains impressive results for Escherization on the Euclidean plane and density estimation on non-Euclidean identification spaces. Code and results: https://dwipddalal.github.io/FlowSymmetry/ △ Less

Submitted 21 July, 2025; v1 submitted 11 December, 2023; originally announced December 2023.

arXiv:2311.11988 [pdf, other]

Categorizing the Visual Environment and Analyzing the Visual Attention of Dogs

Authors: Shreyas Sundara Raman, Madeline H. Pelgrim, Daphna Buchsbaum, Thomas Serre

Abstract: Dogs have a unique evolutionary relationship with humans and serve many important roles e.g. search and rescue, blind assistance, emotional support. However, few datasets exist to categorize visual features and objects available to dogs, as well as how dogs direct their visual attention within their environment. We collect and study a dataset with over 11,698 gazes to categorize the objects availa… ▽ More Dogs have a unique evolutionary relationship with humans and serve many important roles e.g. search and rescue, blind assistance, emotional support. However, few datasets exist to categorize visual features and objects available to dogs, as well as how dogs direct their visual attention within their environment. We collect and study a dataset with over 11,698 gazes to categorize the objects available to be gazed at by 11 dogs in everyday outdoor environments i.e. a walk around a college campus and urban area. We explore the availability of these object categories and the visual attention of dogs over these categories using a head mounted eye tracking apparatus. A small portion (approx. 600 images or < 20% of total dataset) of the collected data is used to fine tune a MaskRCNN for the novel image domain to segment objects present in the scene, enabling further statistical analysis on the visual gaze tendencies of dogs. The MaskRCNN, with eye tracking apparatus, serves as an end to end model for automatically classifying the visual fixations of dogs. The fine tuned MaskRCNN performs far better than chance. There are few individual differences between the 11 dogs and we observe greater visual fixations on buses, plants, pavement, and construction equipment. This work takes a step towards understanding visual behavior of dogs and their interaction with the physical world. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 13 pages, 11 figures, 1 table, WACV CV4Smalls Workshop

arXiv:2311.04942 [pdf, other]

CSAM: A 2.5D Cross-Slice Attention Module for Anisotropic Volumetric Medical Image Segmentation

Authors: Alex Ling Yu Hung, Haoxin Zheng, Kai Zhao, Xiaoxi Du, Kaifeng Pang, Qi Miao, Steven S. Raman, Demetri Terzopoulos, Kyunghyun Sung

Abstract: A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D meth… ▽ More A large portion of volumetric medical data, especially magnetic resonance imaging (MRI) data, is anisotropic, as the through-plane resolution is typically much lower than the in-plane resolution. Both 3D and purely 2D deep learning-based segmentation methods are deficient in dealing with such volumetric data since the performance of 3D methods suffers when confronting anisotropic data, and 2D methods disregard crucial volumetric information. Insufficient work has been done on 2.5D methods, in which 2D convolution is mainly used in concert with volumetric information. These models focus on learning the relationship across slices, but typically have many parameters to train. We offer a Cross-Slice Attention Module (CSAM) with minimal trainable parameters, which captures information across all the slices in the volume by applying semantic, positional, and slice attention on deep feature maps at different scales. Our extensive experiments using different network architectures and tasks demonstrate the usefulness and generalizability of CSAM. Associated code is available at https://github.com/aL3x-O-o-Hung/CSAM. △ Less

Submitted 26 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.16532 [pdf, other]

Learning Robust Deep Visual Representations from EEG Brain Recordings

Authors: Prajwal Singh, Dwip Dalal, Gautam Vashishtha, Krishna Miyapuram, Shanmuganathan Raman

Abstract: Decoding the human brain has been a hallmark of neuroscientists and Artificial Intelligence researchers alike. Reconstruction of visual images from brain Electroencephalography (EEG) signals has garnered a lot of interest due to its applications in brain-computer interfacing. This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep r… ▽ More Decoding the human brain has been a hallmark of neuroscientists and Artificial Intelligence researchers alike. Reconstruction of visual images from brain Electroencephalography (EEG) signals has garnered a lot of interest due to its applications in brain-computer interfacing. This study proposes a two-stage method where the first step is to obtain EEG-derived features for robust learning of deep representations and subsequently utilize the learned representation for image generation and classification. We demonstrate the generalizability of our feature extraction pipeline across three different datasets using deep-learning architectures with supervised and contrastive learning methods. We have performed the zero-shot EEG classification task to support the generalizability claim further. We observed that a subject invariant linearly separable visual representation was learned using EEG data alone in an unimodal setting that gives better k-means accuracy as compared to a joint representation learning between EEG and images. Finally, we propose a novel framework to transform unseen images into the EEG space and reconstruct them with approximation, showcasing the potential for image reconstruction from EEG signals. Our proposed image synthesis method from EEG shows 62.9% and 36.13% inception score improvement on the EEGCVPR40 and the Thoughtviz datasets, which is better than state-of-the-art performance in GAN. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: Accepted in WACV 2024

arXiv:2310.08645 [pdf, other]

Defect Analysis of 3D Printed Cylinder Object Using Transfer Learning Approaches

Authors: Md Manjurul Ahsan, Shivakumar Raman, Zahed Siddique

Abstract: Additive manufacturing (AM) is gaining attention across various industries like healthcare, aerospace, and automotive. However, identifying defects early in the AM process can reduce production costs and improve productivity - a key challenge. This study explored the effectiveness of machine learning (ML) approaches, specifically transfer learning (TL) models, for defect detection in 3D-printed cy… ▽ More Additive manufacturing (AM) is gaining attention across various industries like healthcare, aerospace, and automotive. However, identifying defects early in the AM process can reduce production costs and improve productivity - a key challenge. This study explored the effectiveness of machine learning (ML) approaches, specifically transfer learning (TL) models, for defect detection in 3D-printed cylinders. Images of cylinders were analyzed using models including VGG16, VGG19, ResNet50, ResNet101, InceptionResNetV2, and MobileNetV2. Performance was compared across two datasets using accuracy, precision, recall, and F1-score metrics. In the first study, VGG16, InceptionResNetV2, and MobileNetV2 achieved perfect scores. In contrast, ResNet50 had the lowest performance, with an average F1-score of 0.32. Similarly, in the second study, MobileNetV2 correctly classified all instances, while ResNet50 struggled with more false positives and fewer true positives, resulting in an F1-score of 0.75. Overall, the findings suggest certain TL models like MobileNetV2 can deliver high accuracy for AM defect classification, although performance varies across algorithms. The results provide insights into model optimization and integration needs for reliable automated defect analysis during 3D printing. By identifying the top-performing TL techniques, this study aims to enhance AM product quality through robust image-based monitoring and inspection. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.09919 [pdf, other]

Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents

Authors: Ziyi Yang, Shreyas S. Raman, Ankit Shah, Stefanie Tellex

Abstract: Recent advancements in large language models (LLMs) have enabled a new research domain, LLM agents, for solving robotics and planning tasks by leveraging the world knowledge and general reasoning abilities of LLMs obtained during pretraining. However, while considerable effort has been made to teach the robot the "dos," the "don'ts" received relatively less attention. We argue that, for any practi… ▽ More Recent advancements in large language models (LLMs) have enabled a new research domain, LLM agents, for solving robotics and planning tasks by leveraging the world knowledge and general reasoning abilities of LLMs obtained during pretraining. However, while considerable effort has been made to teach the robot the "dos," the "don'ts" received relatively less attention. We argue that, for any practical usage, it is as crucial to teach the robot the "don'ts": conveying explicit instructions about prohibited actions, assessing the robot's comprehension of these restrictions, and, most importantly, ensuring compliance. Moreover, verifiable safe operation is essential for deployments that satisfy worldwide standards such as ISO 61508, which defines standards for safely deploying robots in industrial factory environments worldwide. Aiming at deploying the LLM agents in a collaborative environment, we propose a queryable safety constraint module based on linear temporal logic (LTL) that simultaneously enables natural language (NL) to temporal constraints encoding, safety violation reasoning and explaining, and unsafe action pruning. To demonstrate the effectiveness of our system, we conducted experiments in VirtualHome environment and on a real robot. The experimental results show that our system strictly adheres to the safety constraints and scales well with complex safety constraints, highlighting its potential for practical utility. △ Less

Submitted 28 November, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

arXiv:2308.13488 [pdf, other]

doi 10.1007/978-3-031-43898-1_44

Temporal Uncertainty Localization to Enable Human-in-the-loop Analysis of Dynamic Contrast-enhanced Cardiac MRI Datasets

Authors: Dilek M. Yalcinkaya, Khalid Youssef, Bobak Heydari, Orlando Simonetti, Rohan Dharmakumar, Subha Raman, Behzad Sharif

Abstract: Dynamic contrast-enhanced (DCE) cardiac magnetic resonance imaging (CMRI) is a widely used modality for diagnosing myocardial blood flow (perfusion) abnormalities. During a typical free-breathing DCE-CMRI scan, close to 300 time-resolved images of myocardial perfusion are acquired at various contrast "wash in/out" phases. Manual segmentation of myocardial contours in each time-frame of a DCE image… ▽ More Dynamic contrast-enhanced (DCE) cardiac magnetic resonance imaging (CMRI) is a widely used modality for diagnosing myocardial blood flow (perfusion) abnormalities. During a typical free-breathing DCE-CMRI scan, close to 300 time-resolved images of myocardial perfusion are acquired at various contrast "wash in/out" phases. Manual segmentation of myocardial contours in each time-frame of a DCE image series can be tedious and time-consuming, particularly when non-rigid motion correction has failed or is unavailable. While deep neural networks (DNNs) have shown promise for analyzing DCE-CMRI datasets, a "dynamic quality control" (dQC) technique for reliably detecting failed segmentations is lacking. Here we propose a new space-time uncertainty metric as a dQC tool for DNN-based segmentation of free-breathing DCE-CMRI datasets by validating the proposed metric on an external dataset and establishing a human-in-the-loop framework to improve the segmentation results. In the proposed approach, we referred the top 10% most uncertain segmentations as detected by our dQC tool to the human expert for refinement. This approach resulted in a significant increase in the Dice score (p<0.001) and a notable decrease in the number of images with failed segmentation (16.2% to 11.3%) whereas the alternative approach of randomly selecting the same number of segmentations for human referral did not achieve any significant improvement. Our results suggest that the proposed dQC framework has the potential to accurately identify poor-quality segmentations and may enable efficient DNN-based analysis of DCE-CMRI in a human-in-the-loop pipeline for clinical interpretation and reporting of dynamic CMRI datasets. △ Less

Submitted 13 November, 2023; v1 submitted 25 August, 2023; originally announced August 2023.

Comments: Accepted for publication in MICCAI 2023

arXiv:2307.08652 [pdf, other]

Search Me Knot, Render Me Knot: Embedding Search and Differentiable Rendering of Knots in 3D

Authors: Aalok Gangopadhyay, Paras Gupta, Tarun Sharma, Prajwal Singh, Shanmuganathan Raman

Abstract: We introduce the problem of knot-based inverse perceptual art. Given multiple target images and their corresponding viewing configurations, the objective is to find a 3D knot-based tubular structure whose appearance resembles the target images when viewed from the specified viewing configurations. To solve this problem, we first design a differentiable rendering algorithm for rendering tubular kno… ▽ More We introduce the problem of knot-based inverse perceptual art. Given multiple target images and their corresponding viewing configurations, the objective is to find a 3D knot-based tubular structure whose appearance resembles the target images when viewed from the specified viewing configurations. To solve this problem, we first design a differentiable rendering algorithm for rendering tubular knots embedded in 3D for arbitrary perspective camera configurations. Utilizing this differentiable rendering algorithm, we search over the space of knot configurations to find the ideal knot embedding. We represent the knot embeddings via homeomorphisms of the desired template knot, where the homeomorphisms are parametrized by the weights of an invertible neural network. Our approach is fully differentiable, making it possible to find the ideal 3D tubular structure for the desired perceptual art using gradient-based optimization. We propose several loss functions that impose additional physical constraints, enforcing that the tube is free of self-intersection, lies within a predefined region in space, satisfies the physical bending limits of the tube material and the material cost is within a specified budget. We demonstrate through results that our knot representation is highly expressive and gives impressive results even for challenging target images in both single view as well as multiple view constraints. Through extensive ablation study we show that each of the proposed loss function is effective in ensuring physical realizability. We construct a real world 3D-printed object to demonstrate the practical utility of our approach. To the best of our knowledge, we are the first to propose a fully differentiable optimization framework for knot-based inverse perceptual art. △ Less

Submitted 19 August, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.02814 [pdf, other]

Single Image LDR to HDR Conversion using Conditional Diffusion

Authors: Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman

Abstract: Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I… ▽ More Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I) translation task and propose a conditional Denoising Diffusion Probabilistic Model (DDPM) based framework using classifier-free guidance. We incorporate a deep CNN-based autoencoder in our proposed framework to enhance the quality of the latent representation of the input LDR image used for conditioning. Moreover, we introduce a new loss function for LDR-HDR translation tasks, termed Exposure Loss. This loss helps direct gradients in the opposite direction of the saturation, further improving the results' quality. By conducting comprehensive quantitative and qualitative experiments, we have effectively demonstrated the proficiency of our proposed method. The results indicate that a simple conditional diffusion-based method can replace the complex camera pipeline-based architectures. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Journal ref: IEEE International Conference on Image Processing 2023

arXiv:2306.13452 [pdf, other]

A Graph Neural Network Approach for Temporal Mesh Blending and Correspondence

Authors: Aalok Gangopadhyay, Abhinav Narayan Harish, Prajwal Singh, Shanmuganathan Raman

Abstract: We have proposed a self-supervised deep learning framework for solving the mesh blending problem in scenarios where the meshes are not in correspondence. To solve this problem, we have developed Red-Blue MPNN, a novel graph neural network that processes an augmented graph to estimate the correspondence. We have designed a novel conditional refinement scheme to find the exact correspondence when ce… ▽ More We have proposed a self-supervised deep learning framework for solving the mesh blending problem in scenarios where the meshes are not in correspondence. To solve this problem, we have developed Red-Blue MPNN, a novel graph neural network that processes an augmented graph to estimate the correspondence. We have designed a novel conditional refinement scheme to find the exact correspondence when certain conditions are satisfied. We further develop a graph neural network that takes the aligned meshes and the time value as input and fuses this information to process further and generate the desired result. Using motion capture datasets and human mesh designing software, we create a large-scale synthetic dataset consisting of temporal sequences of human meshes in motion. Our results demonstrate that our approach generates realistic deformation of body parts given complex inputs. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2305.20077 [pdf, other]

Managed Geo-Distributed Feature Store: Architecture and System Design

Authors: Anya Li, Bhala Ranganathan, Feng Pan, Mickey Zhang, Qianjun Xu, Runhan Li, Sethu Raman, Shail Paragbhai Shah, Vivienne Tang

Abstract: Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from various data sources and materialize the same for future consumption. Without feature stores, different teams across various business groups would maintain the above… ▽ More Companies are using machine learning to solve real-world problems and are developing hundreds to thousands of features in the process. They are building feature engineering pipelines as part of MLOps life cycle to transform data from various data sources and materialize the same for future consumption. Without feature stores, different teams across various business groups would maintain the above process independently, which can lead to conflicting and duplicated features in the system. Data scientists find it hard to search for and reuse existing features and it is painful to maintain version control. Furthermore, feature correctness violations related to online (inferencing) - offline (training) skews and data leakage are common. Although the machine learning community has extensively discussed the need for feature stores and their purpose, this paper aims to capture the core architectural components that make up a managed feature store and to share the design learning in building such a system. △ Less

Submitted 31 May, 2023; originally announced May 2023.

Comments: All the authors are from the AzureML Feature Store product group and are listed in alphabetical order. Bhala Ranganathan: System architect and tech lead of AzureML Feature Store. Feng Pan, Qianjun Xu: Engineering managers. Sethu Raman: Product Manager of AzureML Feature Store who structured and organized the product vision and specifications

arXiv:2305.09777 [pdf, other]

BSGAN: A Novel Oversampling Technique for Imbalanced Pattern Recognitions

Authors: Md Manjurul Ahsan, Shivakumar Raman, Zahed Siddique

Abstract: Class imbalanced problems (CIP) are one of the potential challenges in developing unbiased Machine Learning (ML) models for predictions. CIP occurs when data samples are not equally distributed between the two or multiple classes. Borderline-Synthetic Minority Oversampling Techniques (SMOTE) is one of the approaches that has been used to balance the imbalance data by oversampling the minor (limite… ▽ More Class imbalanced problems (CIP) are one of the potential challenges in developing unbiased Machine Learning (ML) models for predictions. CIP occurs when data samples are not equally distributed between the two or multiple classes. Borderline-Synthetic Minority Oversampling Techniques (SMOTE) is one of the approaches that has been used to balance the imbalance data by oversampling the minor (limited) samples. One of the potential drawbacks of existing Borderline-SMOTE is that it focuses on the data samples that lay at the border point and gives more attention to the extreme observations, ultimately limiting the creation of more diverse data after oversampling, and that is the almost scenario for the most of the borderline-SMOTE based oversampling strategies. As an effect, marginalization occurs after oversampling. To address these issues, in this work, we propose a hybrid oversampling technique by combining the power of borderline SMOTE and Generative Adversarial Network to generate more diverse data that follow Gaussian distributions. We named it BSGAN and tested it on four highly imbalanced datasets: Ecoli, Wine quality, Yeast, and Abalone. Our preliminary computational results reveal that BSGAN outperformed existing borderline SMOTE and GAN-based oversampling techniques and created a more diverse dataset that follows normal distribution after oversampling effect. △ Less

Submitted 16 May, 2023; originally announced May 2023.

arXiv:2305.08189 [pdf, other]

doi 10.56553/popets-2023-0073

CERTainty: Detecting DNS Manipulation at Scale using TLS Certificates

Authors: Elisa Tsai, Deepak Kumar, Ram Sundara Raman, Gavin Li, Yael Eiger, Roya Ensafi

Abstract: DNS manipulation is an increasingly common technique used by censors and other network adversaries to prevent users from accessing restricted Internet resources and hijack their connections. Prior work in detecting DNS manipulation relies largely on comparing DNS resolutions with trusted control results to identify inconsistencies. However, the emergence of CDNs and other cloud providers practicin… ▽ More DNS manipulation is an increasingly common technique used by censors and other network adversaries to prevent users from accessing restricted Internet resources and hijack their connections. Prior work in detecting DNS manipulation relies largely on comparing DNS resolutions with trusted control results to identify inconsistencies. However, the emergence of CDNs and other cloud providers practicing content localization and load balancing leads to these heuristics being inaccurate, paving the need for more verifiable signals of DNS manipulation. In this paper, we develop a new technique, CERTainty, that utilizes the widely established TLS certificate ecosystem to accurately detect DNS manipulation, and obtain more information about the adversaries performing such manipulation. We find that untrusted certificates, mismatching hostnames, and blockpages are powerful proxies for detecting DNS manipulation. Our results show that previous work using consistency-based heuristics is inaccurate, allowing for 72.45% false positives in the cases detected as DNS manipulation. Further, we identify 17 commercial DNS filtering products in 52 countries, including products such as SafeDNS, SkyDNS, and Fortinet, and identify the presence of 55 ASes in 26 countries that perform ISP-level DNS manipulation. We also identify 226 new blockpage clusters that are not covered by previous research. We are integrating techniques used by CERTainty into active measurement platforms to continuously and accurately monitor DNS manipulation. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: To Appear in: Privacy Enhancing Technologies Symposium (PETS), July 2023

arXiv:2305.04401 [pdf, other]

Few Shot Learning for Medical Imaging: A Comparative Analysis of Methodologies and Formal Mathematical Framework

Authors: Jannatul Nayem, Sayed Sahriar Hasan, Noshin Amina, Bristy Das, Md Shahin Ali, Md Manjurul Ahsan, Shivakumar Raman

Abstract: Deep learning becomes an elevated context regarding disposing of many machine learning tasks and has shown a breakthrough upliftment to extract features from unstructured data. Though this flourishing context is developing in the medical image processing sector, scarcity of problem-dependent training data has become a larger issue in the way of easy application of deep learning in the medical sect… ▽ More Deep learning becomes an elevated context regarding disposing of many machine learning tasks and has shown a breakthrough upliftment to extract features from unstructured data. Though this flourishing context is developing in the medical image processing sector, scarcity of problem-dependent training data has become a larger issue in the way of easy application of deep learning in the medical sector. To unravel the confined data source, researchers have developed a model that can solve machine learning problems with fewer data called ``Few shot learning". Few hot learning algorithms determine to solve the data limitation problems by extracting the characteristics from a small dataset through classification and segmentation methods. In the medical sector, there is frequently a shortage of available datasets in respect of some confidential diseases. Therefore, Few shot learning gets the limelight in this data scarcity sector. In this chapter, the background and basic overview of a few shots of learning is represented. Henceforth, the classification of few-shot learning is described also. Even the paper shows a comparison of methodological approaches that are applied in medical image analysis over time. The current advancement in the implementation of few-shot learning concerning medical imaging is illustrated. The future scope of this domain in the medical imaging sector is further described. △ Less

Submitted 31 May, 2023; v1 submitted 7 May, 2023; originally announced May 2023.

Comments: Accepted for a Springer book chapter for a book title "Data-driven approaches to Medical Imaging"

arXiv:2304.10582 [pdf, other]

Invariant Scattering Transform for Medical Imaging

Authors: Md Manjurul Ahsan, Shivakumar Raman, Zahed Siddique

Abstract: Over the years, the Invariant Scattering Transform (IST) technique has become popular for medical image analysis, including using wavelet transform computation using Convolutional Neural Networks (CNN) to capture patterns' scale and orientation in the input signal. IST aims to be invariant to transformations that are common in medical images, such as translation, rotation, scaling, and deformation… ▽ More Over the years, the Invariant Scattering Transform (IST) technique has become popular for medical image analysis, including using wavelet transform computation using Convolutional Neural Networks (CNN) to capture patterns' scale and orientation in the input signal. IST aims to be invariant to transformations that are common in medical images, such as translation, rotation, scaling, and deformation, used to improve the performance in medical imaging applications such as segmentation, classification, and registration, which can be integrated into machine learning algorithms for disease detection, diagnosis, and treatment planning. Additionally, combining IST with deep learning approaches has the potential to leverage their strengths and enhance medical image analysis outcomes. This study provides an overview of IST in medical imaging by considering the types of IST, their application, limitations, and potential scopes for future researchers and practitioners. △ Less

Submitted 31 May, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

Comments: Accepted for Springer book chapter for a book "Data-driven approaches to Medical Imaging"

arXiv:2302.10121 [pdf, other]

EEG2IMAGE: Image Reconstruction from EEG Brain Signals

Authors: Prajwal Singh, Pankaj Pandey, Krishna Miyapuram, Shanmuganathan Raman

Abstract: Reconstructing images using brain signals of imagined visuals may provide an augmented vision to the disabled, leading to the advancement of Brain-Computer Interface (BCI) technology. The recent progress in deep learning has boosted the study area of synthesizing images from brain signals using Generative Adversarial Networks (GAN). In this work, we have proposed a framework for synthesizing the i… ▽ More Reconstructing images using brain signals of imagined visuals may provide an augmented vision to the disabled, leading to the advancement of Brain-Computer Interface (BCI) technology. The recent progress in deep learning has boosted the study area of synthesizing images from brain signals using Generative Adversarial Networks (GAN). In this work, we have proposed a framework for synthesizing the images from the brain activity recorded by an electroencephalogram (EEG) using small-size EEG datasets. This brain activity is recorded from the subject's head scalp using EEG when they ask to visualize certain classes of Objects and English characters. We use a contrastive learning method in the proposed framework to extract features from EEG signals and synthesize the images from extracted features using conditional GAN. We modify the loss function to train the GAN, which enables it to synthesize 128x128 images using a small number of images. Further, we conduct ablation studies and experiments to show the effectiveness of our proposed framework over other state-of-the-art methods using the small EEG dataset. △ Less

Submitted 18 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Comments: Accepted in ICASSP 2023

arXiv:2212.03733 [pdf, other]

Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior

Authors: Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, Michael L. Littman

Abstract: Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our job in the learning process is to design reward functions to express desired behavior and enable the agent to learn such behavior swiftly. However, designing good reward functions to induce the desired behavior is generally hard, let alone the question of which rewards make learning fa… ▽ More Reinforcement-learning agents seek to maximize a reward signal through environmental interactions. As humans, our job in the learning process is to design reward functions to express desired behavior and enable the agent to learn such behavior swiftly. However, designing good reward functions to induce the desired behavior is generally hard, let alone the question of which rewards make learning fast. In this work, we introduce a family of a reward structures we call Tiered Reward that addresses both of these questions. We consider the reward-design problem in tasks formulated as reaching desirable states and avoiding undesirable states. To start, we propose a strict partial ordering of the policy space to resolve trade-offs in behavior preference. We prefer policies that reach the good states faster and with higher probability while avoiding the bad states longer. Next, we introduce Tiered Reward, a class of environment-independent reward functions and show it is guaranteed to induce policies that are Pareto-optimal according to our preference relation. Finally, we demonstrate that Tiered Reward leads to fast learning with multiple tabular and deep reinforcement-learning algorithms. △ Less

Submitted 1 August, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

Comments: For code, see https://github.com/zhouzypaul/tiered-reward

Journal ref: Reinforcement Learning Journal, vol. 1, no. 1, 2024, pp. TBD

arXiv:2211.11040 [pdf, other]

PointResNet: Residual Network for 3D Point Cloud Segmentation and Classification

Authors: Aadesh Desai, Saagar Parikh, Seema Kumari, Shanmuganathan Raman

Abstract: Point cloud segmentation and classification are some of the primary tasks in 3D computer vision with applications ranging from augmented reality to robotics. However, processing point clouds using deep learning-based algorithms is quite challenging due to the irregular point formats. Voxelization or 3D grid-based representation are different ways of applying deep neural networks to this problem. I… ▽ More Point cloud segmentation and classification are some of the primary tasks in 3D computer vision with applications ranging from augmented reality to robotics. However, processing point clouds using deep learning-based algorithms is quite challenging due to the irregular point formats. Voxelization or 3D grid-based representation are different ways of applying deep neural networks to this problem. In this paper, we propose PointResNet, a residual block-based approach. Our model directly processes the 3D points, using a deep neural network for the segmentation and classification tasks. The main components of the architecture are: 1) residual blocks and 2) multi-layered perceptron (MLP). We show that it preserves profound features and structural information, which are useful for segmentation and classification tasks. The experimental evaluations demonstrate that the proposed model produces the best results for segmentation and comparable results for classification in comparison to the conventional baselines. △ Less

Submitted 20 November, 2022; originally announced November 2022.

Comments: Paper Under Review at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023

Showing 1–50 of 107 results for author: Raman, S