-
LatentMove: Towards Complex Human Movement Video Generation
Authors:
Ashkan Taghipour,
Morteza Ghahremani,
Mohammed Bennamoun,
Farid Boussaid,
Aref Miri Rekavandi,
Zinuo Li,
Qiuhong Ke,
Hamid Laga
Abstract:
Image-to-video (I2V) generation seeks to produce realistic motion sequences from a single reference image. Although recent methods exhibit strong temporal consistency, they often struggle when dealing with complex, non-repetitive human movements, leading to unnatural deformations. To tackle this issue, we present LatentMove, a DiT-based framework specifically tailored for highly dynamic human anim…
▽ More
Image-to-video (I2V) generation seeks to produce realistic motion sequences from a single reference image. Although recent methods exhibit strong temporal consistency, they often struggle when dealing with complex, non-repetitive human movements, leading to unnatural deformations. To tackle this issue, we present LatentMove, a DiT-based framework specifically tailored for highly dynamic human animation. Our architecture incorporates a conditional control branch and learnable face/body tokens to preserve consistency as well as fine-grained details across frames. We introduce Complex-Human-Videos (CHV), a dataset featuring diverse, challenging human motions designed to benchmark the robustness of I2V systems. We also introduce two metrics to assess the flow and silhouette consistency of generated videos with their ground truth. Experimental results indicate that LatentMove substantially improves human animation quality--particularly when handling rapid, intricate movements--thereby pushing the boundaries of I2V generation. The code, the CHV dataset, and the evaluation metrics will be available at https://github.com/ --.
△ Less
Submitted 27 June, 2025; v1 submitted 28 May, 2025;
originally announced May 2025.
-
MedBridge: Bridging Foundation Vision-Language Models to Medical Image Diagnosis
Authors:
Yitong Li,
Morteza Ghahremani,
Christian Wachinger
Abstract:
Recent vision-language foundation models deliver state-of-the-art results on natural image classification but falter on medical images due to pronounced domain shifts. At the same time, training a medical foundation model requires substantial resources, including extensive annotated data and high computational capacity. To bridge this gap with minimal overhead, we introduce MedBridge, a lightweigh…
▽ More
Recent vision-language foundation models deliver state-of-the-art results on natural image classification but falter on medical images due to pronounced domain shifts. At the same time, training a medical foundation model requires substantial resources, including extensive annotated data and high computational capacity. To bridge this gap with minimal overhead, we introduce MedBridge, a lightweight multimodal adaptation framework that re-purposes pretrained VLMs for accurate medical image diagnosis. MedBridge comprises three key components. First, a Focal Sampling module that extracts high-resolution local regions to capture subtle pathological features and compensate for the limited input resolution of general-purpose VLMs. Second, a Query Encoder (QEncoder) injects a small set of learnable queries that attend to the frozen feature maps of VLM, aligning them with medical semantics without retraining the entire backbone. Third, a Mixture of Experts mechanism, driven by learnable queries, harnesses the complementary strength of diverse VLMs to maximize diagnostic performance. We evaluate MedBridge on five medical imaging benchmarks across three key adaptation tasks, demonstrating its superior performance in both cross-domain and in-domain adaptation settings, even under varying levels of training data availability. Notably, MedBridge achieved over 6-15% improvement in AUC compared to state-of-the-art VLM adaptation methods in multi-label thoracic disease diagnosis, underscoring its effectiveness in leveraging foundation models for accurate and data-efficient medical diagnosis. Our code is available at https://github.com/ai-med/MedBridge.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET
Authors:
Yitong Li,
Morteza Ghahremani,
Youssef Wally,
Christian Wachinger
Abstract:
Diagnosing dementia, particularly for Alzheimer's Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. More…
▽ More
Diagnosing dementia, particularly for Alzheimer's Disease (AD) and frontotemporal dementia (FTD), is complex due to overlapping symptoms. While magnetic resonance imaging (MRI) and positron emission tomography (PET) data are critical for the diagnosis, integrating these modalities in deep learning faces challenges, often resulting in suboptimal performance compared to using single modalities. Moreover, the potential of multi-modal approaches in differential diagnosis, which holds significant clinical importance, remains largely unexplored. We propose a novel framework, DiaMond, to address these issues with vision Transformers to effectively integrate MRI and PET. DiaMond is equipped with self-attention and a novel bi-attention mechanism that synergistically combine MRI and PET, alongside a multi-modal normalization to reduce redundant dependency, thereby boosting the performance. DiaMond significantly outperforms existing multi-modal methods across various datasets, achieving a balanced accuracy of 92.4% in AD diagnosis, 65.2% for AD-MCI-CN classification, and 76.5% in differential diagnosis of AD and FTD. We also validated the robustness of DiaMond in a comprehensive ablation study. The code is available at https://github.com/ai-med/DiaMond.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Mamba? Catch The Hype Or Rethink What Really Helps for Image Registration
Authors:
Bailiang Jian,
Jiazhen Pan,
Morteza Ghahremani,
Daniel Rueckert,
Christian Wachinger,
Benedikt Wiestler
Abstract:
Our findings indicate that adopting "advanced" computational elements fails to significantly improve registration accuracy. Instead, well-established registration-specific designs offer fair improvements, enhancing results by a marginal 1.5\% over the baseline. Our findings emphasize the importance of rigorous, unbiased evaluation and contribution disentanglement of all low- and high-level registr…
▽ More
Our findings indicate that adopting "advanced" computational elements fails to significantly improve registration accuracy. Instead, well-established registration-specific designs offer fair improvements, enhancing results by a marginal 1.5\% over the baseline. Our findings emphasize the importance of rigorous, unbiased evaluation and contribution disentanglement of all low- and high-level registration components, rather than simply following the computer vision trends with "more advanced" computational blocks. We advocate for simpler yet effective solutions and novel evaluation metrics that go beyond conventional registration accuracy, warranting further research across diverse organs and modalities. The code is available at \url{https://github.com/BailiangJ/rethink-reg}.
△ Less
Submitted 27 July, 2024;
originally announced July 2024.
-
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions
Authors:
Ashkan Taghipour,
Morteza Ghahremani,
Mohammed Bennamoun,
Aref Miri Rekavandi,
Zinuo Li,
Hamid Laga,
Farid Boussaid
Abstract:
This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework, focusing on their impact on video generation quality and computational efficiency. Our findings indicate that CLIP embeddings, while crucial for aesthetic quality, do not significantly contribute towards the subject and background consistency of video outputs. Moreover, the computationally…
▽ More
This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion (SVD) framework, focusing on their impact on video generation quality and computational efficiency. Our findings indicate that CLIP embeddings, while crucial for aesthetic quality, do not significantly contribute towards the subject and background consistency of video outputs. Moreover, the computationally expensive cross-attention mechanism can be effectively replaced by a simpler linear layer. This layer is computed only once at the first diffusion inference step, and its output is then cached and reused throughout the inference process, thereby enhancing efficiency while maintaining high-quality outputs. Building on these insights, we introduce the VCUT, a training-free approach optimized for efficiency within the SVD architecture. VCUT eliminates temporal cross-attention and replaces spatial cross-attention with a one-time computed linear layer, significantly reducing computational load. The implementation of VCUT leads to a reduction of up to 322T Multiple-Accumulate Operations (MACs) per video and a decrease in model parameters by up to 50M, achieving a 20% reduction in latency compared to the baseline. Our approach demonstrates that conditioning during the Semantic Binding stage is sufficient, eliminating the need for continuous computation across all inference steps and setting a new standard for efficient video generation.
△ Less
Submitted 27 July, 2024;
originally announced July 2024.
-
Extraordinary Quality Factors in Dual-Band Polarization-Insensitive QuasiBound States in the Continuum
Authors:
Maryam Ghahremani,
Carlos J. Zapata-Rodríguez
Abstract:
In this study, we investigate a novel "dimerized" dielectric metasurface featuring dual-mode resonances governed by symmetry-protected bound states in the continuum (BICs). The metasurface design offers advantages such as insensitivity to incident light polarization and exceptionally high quality factors exceeding 10$^5$ for low and moderate structural deviations from the monoatomic array. By intr…
▽ More
In this study, we investigate a novel "dimerized" dielectric metasurface featuring dual-mode resonances governed by symmetry-protected bound states in the continuum (BICs). The metasurface design offers advantages such as insensitivity to incident light polarization and exceptionally high quality factors exceeding 10$^5$ for low and moderate structural deviations from the monoatomic array. By introducing mastered perturbations in the metasurface via the Brillouin zone folding method, without reducing symmetry, we explore the behavior of symmetry-protected BIC states and their polarization-independent responses. Through numerical simulations and analysis, we demonstrate the superiority of Q factors for specific BIC-based resonances, leading to precise control over interaction behaviors and light engineering in both near and far fields. Our findings contribute to the understanding of BIC resonance interactions and offer insights into the design of high-performance sensing applications and meta-devices with enhanced functionalities.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation
Authors:
Jiajun Wang,
Morteza Ghahremani,
Yitong Li,
Björn Ommer,
Christian Wachinger
Abstract:
Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pos…
▽ More
Controllable text-to-image (T2I) diffusion models have shown impressive performance in generating high-quality visual content through the incorporation of various conditions. Current methods, however, exhibit limited performance when guided by skeleton human poses, especially in complex pose conditions such as side or rear perspectives of human figures. To address this issue, we present Stable-Pose, a novel adapter model that introduces a coarse-to-fine attention masking strategy into a vision Transformer (ViT) to gain accurate pose guidance for T2I models. Stable-Pose is designed to adeptly handle pose conditions within pre-trained Stable Diffusion, providing a refined and efficient way of aligning pose representation during image synthesis. We leverage the query-key self-attention mechanism of ViTs to explore the interconnections among different anatomical parts in human pose skeletons. Masked pose images are used to smoothly refine the attention maps based on target pose-related features in a hierarchical manner, transitioning from coarse to fine levels. Additionally, our loss function is formulated to allocate increased emphasis to the pose region, thereby augmenting the model's precision in capturing intricate pose details. We assessed the performance of Stable-Pose across five public datasets under a wide range of indoor and outdoor human pose scenarios. Stable-Pose achieved an AP score of 57.1 in the LAION-Human dataset, marking around 13% improvement over the established technique ControlNet. The project link and code is available at https://github.com/ai-med/StablePose.
△ Less
Submitted 5 November, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Simulating Light Propagation through Biological Media Using Monte-Carlo Method
Authors:
Maryam Ghahremani
Abstract:
Biological tissues are complex structures composed of many elements which make light-based tissue diagnostics challenging. Over the past decades, Monte Carlo technique has been used as a fundamental and versatile approach toward modeling photon-tissue interactions. This report first describes a MC simulation of steady-state light transport in an absorbing and diffusing multi-layered structure. Fur…
▽ More
Biological tissues are complex structures composed of many elements which make light-based tissue diagnostics challenging. Over the past decades, Monte Carlo technique has been used as a fundamental and versatile approach toward modeling photon-tissue interactions. This report first describes a MC simulation of steady-state light transport in an absorbing and diffusing multi-layered structure. Further, a parallel processing solution is implemented to reduce execution time and memory requirements. Then, the nonparametric phase function, which is a discretized version of the phase function, has been discussed where the integration of the phase function is a numerical process, instead of an analytical operation. Finally, to simulate more realistic structures of biological systems, simulations are modified to incorporate objects of various shapes (sphere, ellipsoid, or cylinder) with a refractive-index mismatched boundary. The output files mainly contain 2D reflection matrix, 2D transmission matrix, and 3D absorption matrix.
△ Less
Submitted 15 May, 2024; v1 submitted 10 May, 2024;
originally announced May 2024.
-
Metamaterial-induced-transparency engineering through quasi-bound states in the continuum by using dielectric cross-shaped trimers
Authors:
Maryam Ghahremani,
Carlos J. Zapata-Rodriguez
Abstract:
This study presents a novel approach to activate a narrowband transparency line within a reflecting broadband window in all-dielectric metasurfaces, in analogy to the electromagnetically-induced transparency effect, by means of a quasi-bound state in the continuum (qBIC). We demonstrate that the resonance overlapping of a bright mode and a qBIC-based nearly-dark mode with distinct Q-factor can be…
▽ More
This study presents a novel approach to activate a narrowband transparency line within a reflecting broadband window in all-dielectric metasurfaces, in analogy to the electromagnetically-induced transparency effect, by means of a quasi-bound state in the continuum (qBIC). We demonstrate that the resonance overlapping of a bright mode and a qBIC-based nearly-dark mode with distinct Q-factor can be fully governed by a silicon trimer-based unit cell with broken-inversion-symmetry cross shape, thus providing the required response under normal incidence of a linearly-polarized light. Our analysis that is derived from the far-field multipolar decomposition and near-field electromagnetic distributions uncovers the main contributions of different multipoles on the qBIC resonance, with governing magnetic dipole and electric quadrupole terms supplied by distinct parts of the dielectric ``molecule.'' The findings extracted from this research open up new avenues for the development of polarization-dependent technologies, with particular interest in its capabilities for sensing and biosensing.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
Box It to Bind It: Unified Layout Control and Attribute Binding in T2I Diffusion Models
Authors:
Ashkan Taghipour,
Morteza Ghahremani,
Mohammed Bennamoun,
Aref Miri Rekavandi,
Hamid Laga,
Farid Boussaid
Abstract:
While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three…
▽ More
While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module - a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and ii) attribute binding, guaranteeing that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models, markedly enhancing model performance in addressing the key challenges. We evaluate our technique using the established CompBench and TIFA score benchmarks, demonstrating significant performance improvements compared to existing methods. The source code will be made publicly available at https://github.com/nextaistudio/BoxIt2BindIt.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
No-Clean-Reference Image Super-Resolution: Application to Electron Microscopy
Authors:
Mohammad Khateri,
Morteza Ghahremani,
Alejandra Sierra,
Jussi Tohka
Abstract:
The inability to acquire clean high-resolution (HR) electron microscopy (EM) images over a large brain tissue volume hampers many neuroscience studies. To address this challenge, we propose a deep-learning-based image super-resolution (SR) approach to computationally reconstruct clean HR 3D-EM with a large field of view (FoV) from noisy low-resolution (LR) acquisition. Our contributions are I) Inv…
▽ More
The inability to acquire clean high-resolution (HR) electron microscopy (EM) images over a large brain tissue volume hampers many neuroscience studies. To address this challenge, we propose a deep-learning-based image super-resolution (SR) approach to computationally reconstruct clean HR 3D-EM with a large field of view (FoV) from noisy low-resolution (LR) acquisition. Our contributions are I) Investigating training with no-clean references for $\ell_2$ and $\ell_1$ loss functions; II) Introducing a novel network architecture, named EMSR, for enhancing the resolution of LR EM images while reducing inherent noise; and, III) Comparing different training strategies including using acquired LR and HR image pairs, i.e., real pairs with no-clean references contaminated with real corruptions, the pairs of synthetic LR and acquired HR, as well as acquired LR and denoised HR pairs. Experiments with nine brain datasets showed that training with real pairs can produce high-quality super-resolved results, demonstrating the feasibility of training with non-clean references for both loss functions. Additionally, comparable results were observed, both visually and numerically, when employing denoised and noisy references for training. Moreover, utilizing the network trained with synthetically generated LR images from HR counterparts proved effective in yielding satisfactory SR results, even in certain cases, outperforming training with real pairs. The proposed SR network was compared quantitatively and qualitatively with several established SR techniques, showcasing either the superiority or competitiveness of the proposed method in mitigating noise while recovering fine details.
△ Less
Submitted 26 January, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
RegBN: Batch Normalization of Multimodal Data with Regularization
Authors:
Morteza Ghahremani,
Christian Wachinger
Abstract:
Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in the integration of multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted varia…
▽ More
Recent years have witnessed a surge of interest in integrating high-dimensional data captured by multisource sensors, driven by the impressive success of neural networks in the integration of multimodal data. However, the integration of heterogeneous multimodal data poses a significant challenge, as confounding effects and dependencies among such heterogeneous data sources introduce unwanted variability and bias, leading to suboptimal performance of multimodal models. Therefore, it becomes crucial to normalize the low- or high-level features extracted from data modalities before their fusion takes place. This paper introduces a novel approach for the normalization of multimodal data, called RegBN, that incorporates regularization. RegBN uses the Frobenius norm as a regularizer term to address the side effects of confounders and underlying dependencies among different data sources. The proposed method generalizes well across multiple modalities and eliminates the need for learnable parameters, simplifying training and inference. We validate the effectiveness of RegBN on eight databases from five research areas, encompassing diverse modalities such as language, audio, image, video, depth, tabular, and 3D MRI. The proposed method demonstrates broad applicability across different architectures such as multilayer perceptrons, convolutional neural networks, and vision transformers, enabling effective normalization of both low- and high-level features in multimodal neural networks. RegBN is available at \url{https://github.com/mogvision/regbn}.
△ Less
Submitted 19 November, 2023; v1 submitted 1 October, 2023;
originally announced October 2023.
-
Self-Supervised Super-Resolution Approach for Isotropic Reconstruction of 3D Electron Microscopy Images from Anisotropic Acquisition
Authors:
Mohammad Khateri,
Morteza Ghahremani,
Alejandra Sierra,
Jussi Tohka
Abstract:
Three-dimensional electron microscopy (3DEM) is an essential technique to investigate volumetric tissue ultra-structure. Due to technical limitations and high imaging costs, samples are often imaged anisotropically, where resolution in the axial direction ($z$) is lower than in the lateral directions $(x,y)$. This anisotropy 3DEM can hamper subsequent analysis and visualization tasks. To overcome…
▽ More
Three-dimensional electron microscopy (3DEM) is an essential technique to investigate volumetric tissue ultra-structure. Due to technical limitations and high imaging costs, samples are often imaged anisotropically, where resolution in the axial direction ($z$) is lower than in the lateral directions $(x,y)$. This anisotropy 3DEM can hamper subsequent analysis and visualization tasks. To overcome this limitation, we propose a novel deep-learning (DL)-based self-supervised super-resolution approach that computationally reconstructs isotropic 3DEM from the anisotropic acquisition. The proposed DL-based framework is built upon the U-shape architecture incorporating vision-transformer (ViT) blocks, enabling high-capability learning of local and global multi-scale image dependencies. To train the tailored network, we employ a self-supervised approach. Specifically, we generate pairs of anisotropic and isotropic training datasets from the given anisotropic 3DEM data. By feeding the given anisotropic 3DEM dataset in the trained network through our proposed framework, the isotropic 3DEM is obtained. Importantly, this isotropic reconstruction approach relies solely on the given anisotropic 3DEM dataset and does not require pairs of co-registered anisotropic and isotropic 3DEM training datasets. To evaluate the effectiveness of the proposed method, we conducted experiments using three 3DEM datasets acquired from brain. The experimental results demonstrated that our proposed framework could successfully reconstruct isotropic 3DEM from the anisotropic acquisition.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Safe Edges: A Study of Triangulation in Fill-in and Tree-Width Problems
Authors:
Mani Ghahremani,
Janka Chlebikova
Abstract:
This paper considers two well-studied problems \textsc{Minimum Fill-In} (\textsc{Min Fill-In}) and \textsc{Treewidth}. Since both problems are \textsf{NP}-hard, various reduction rules simplifying an input graph have been intensively studied to better understand the structural properties relevant to these problems. Bodlaender at el. introduced the concept of a safe edge that is included in a solut…
▽ More
This paper considers two well-studied problems \textsc{Minimum Fill-In} (\textsc{Min Fill-In}) and \textsc{Treewidth}. Since both problems are \textsf{NP}-hard, various reduction rules simplifying an input graph have been intensively studied to better understand the structural properties relevant to these problems. Bodlaender at el. introduced the concept of a safe edge that is included in a solution of the \textsc{Minimum Fill-In} problem and showed some initial results. In this paper, we extend their result and prove a new condition for an edge set to be safe. This in turn helps us to construct a novel reduction tool for \textsc{Min Fill-In} that we use to answer other questions related to the problem.
In this paper, we also study another interesting research question: Whether there exists a triangulation that answers both problems \textsc{Min Fill-In} and \textsc{Treewidth}. To formalise our study, we introduce a new parameter reflecting a distance of triangulations optimising both problems. We present some initial results regarding this parameter and study graph classes where both problems can be solved with one triangulation.
△ Less
Submitted 30 June, 2023;
originally announced June 2023.
-
Adversarial Distortion Learning for Medical Image Denoising
Authors:
Morteza Ghahremani,
Mohammad Khateri,
Alejandra Sierra,
Jussi Tohka
Abstract:
We present a novel adversarial distortion learning (ADL) for denoising two- and three-dimensional (2D/3D) biomedical image data. The proposed ADL consists of two auto-encoders: a denoiser and a discriminator. The denoiser removes noise from input data and the discriminator compares the denoised result to its noise-free counterpart. This process is repeated until the discriminator cannot differenti…
▽ More
We present a novel adversarial distortion learning (ADL) for denoising two- and three-dimensional (2D/3D) biomedical image data. The proposed ADL consists of two auto-encoders: a denoiser and a discriminator. The denoiser removes noise from input data and the discriminator compares the denoised result to its noise-free counterpart. This process is repeated until the discriminator cannot differentiate the denoised data from the reference. Both the denoiser and the discriminator are built upon a proposed auto-encoder called Efficient-Unet. Efficient-Unet has a light architecture that uses the residual blocks and a novel pyramidal approach in the backbone to efficiently extract and re-use feature maps. During training, the textural information and contrast are controlled by two novel loss functions. The architecture of Efficient-Unet allows generalizing the proposed method to any sort of biomedical data. The 2D version of our network was trained on ImageNet and tested on biomedical datasets whose distribution is completely different from ImageNet; so, there is no need for re-training. Experimental results carried out on magnetic resonance imaging (MRI), dermatoscopy, electron microscopy and X-ray datasets show that the proposed method achieved the best on each benchmark. Our implementation and pre-trained models are available at https://github.com/mogvision/ADL.
△ Less
Submitted 12 March, 2024; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Regional Attention Network (RAN) for Head Pose and Fine-grained Gesture Recognition
Authors:
Ardhendu Behera,
Zachary Wharton,
Morteza Ghahremani,
Swagat Kumar,
Nik Bessis
Abstract:
Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in local appearance. The results show that this is a b…
▽ More
Affect is often expressed via non-verbal body language such as actions/gestures, which are vital indicators for human behaviors. Recent studies on recognition of fine-grained actions/gestures in monocular images have mainly focused on modeling spatial configuration of body parts representing body pose, human-objects interactions and variations in local appearance. The results show that this is a brittle approach since it relies on accurate body parts/objects detection. In this work, we argue that there exist local discriminative semantic regions, whose "informativeness" can be evaluated by the attention mechanism for inferring fine-grained gestures/actions. To this end, we propose a novel end-to-end \textbf{Regional Attention Network (RAN)}, which is a fully Convolutional Neural Network (CNN) to combine multiple contextual regions through attention mechanism, focusing on parts of the images that are most relevant to a given task. Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor. The model is extensively evaluated on ten datasets belonging to 3 different scenarios: 1) head pose recognition, 2) drivers state recognition, and 3) human action and facial expression recognition. The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
FFD: Fast Feature Detector
Authors:
Morteza Ghahremani,
Yonghuai Liu,
Bernard Tiddeman
Abstract:
Scale-invariance, good localization and robustness to noise and distortions are the main properties that a local feature detector should possess. Most existing local feature detectors find excessive unstable feature points that increase the number of keypoints to be matched and the computational time of the matching step. In this paper, we show that robust and accurate keypoints exist in the speci…
▽ More
Scale-invariance, good localization and robustness to noise and distortions are the main properties that a local feature detector should possess. Most existing local feature detectors find excessive unstable feature points that increase the number of keypoints to be matched and the computational time of the matching step. In this paper, we show that robust and accurate keypoints exist in the specific scale-space domain. To this end, we first formulate the superimposition problem into a mathematical model and then derive a closed-form solution for multiscale analysis. The model is formulated via difference-of-Gaussian (DoG) kernels in the continuous scale-space domain, and it is proved that setting the scale-space pyramid's blurring ratio and smoothness to 2 and 0.627, respectively, facilitates the detection of reliable keypoints. For the applicability of the proposed model to discrete images, we discretize it using the undecimated wavelet transform and the cubic spline function. Theoretically, the complexity of our method is less than 5\% of that of the popular baseline Scale Invariant Feature Transform (SIFT). Extensive experimental results show the superiority of the proposed feature detector over the existing representative hand-crafted and learning-based techniques in accuracy and computational time. The code and supplementary materials can be found at~{\url{https://github.com/mogvision/FFD}}.
△ Less
Submitted 1 December, 2020;
originally announced December 2020.
-
Orderly Disorder in Point Cloud Domain
Authors:
Morteza Ghahremani,
Bernard Tiddeman,
Yonghuai Liu,
Ardhendu Behera
Abstract:
In the real world, out-of-distribution samples, noise and distortions exist in test data. Existing deep networks developed for point cloud data analysis are prone to overfitting and a partial change in test data leads to unpredictable behaviour of the networks. In this paper, we propose a smart yet simple deep network for analysis of 3D models using `orderly disorder' theory. Orderly disorder is a…
▽ More
In the real world, out-of-distribution samples, noise and distortions exist in test data. Existing deep networks developed for point cloud data analysis are prone to overfitting and a partial change in test data leads to unpredictable behaviour of the networks. In this paper, we propose a smart yet simple deep network for analysis of 3D models using `orderly disorder' theory. Orderly disorder is a way of describing the complex structure of disorders within complex systems. Our method extracts the deep patterns inside a 3D object via creating a dynamic link to seek the most stable patterns and at once, throws away the unstable ones. Patterns are more robust to changes in data distribution, especially those that appear in the top layers. Features are extracted via an innovative cloning decomposition technique and then linked to each other to form stable complex patterns. Our model alleviates the vanishing-gradient problem, strengthens dynamic link propagation and substantially reduces the number of parameters. Extensive experiments on challenging benchmark datasets verify the superiority of our light network on the segmentation and classification tasks, especially in the presence of noise wherein our network's performance drops less than 10% while the state-of-the-art networks fail to work.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Electric-field Controlled Magnetization Switching in Co/Pt thin-Film Ferromagnets
Authors:
A. Siddique,
S. Gu,
R. Witte,
M. Ghahremani,
C. A. Nwokoye,
A. Aslani,
R. Kruk,
V. Provenzano,
L. H. Bennett,
E. Della Torre
Abstract:
A study of dynamic and reversible voltage controlled magnetization switching in ferromagnetic Co/Pt thin film with perpendicular magnetic anisotropy at room temperature is presented. The change in the magnetic properties of the system is observed in a relatively thick film of 15 nm. A surface charge is induced by the formation of electrochemical double layer between the metallic thin film and non-…
▽ More
A study of dynamic and reversible voltage controlled magnetization switching in ferromagnetic Co/Pt thin film with perpendicular magnetic anisotropy at room temperature is presented. The change in the magnetic properties of the system is observed in a relatively thick film of 15 nm. A surface charge is induced by the formation of electrochemical double layer between the metallic thin film and non-aqueous lithium LiClO4 electrolyte to manipulate the magnetism. The change in the magnetic properties occurred by the application of an external electric field. As the negative voltage was increased, the coercivity and the switching magnetic field decreased thus activating magnetization switching. The results are envisaged to lead to faster and ultra-low power magnetization switching as compared to spin-transfer torque (STT) switching in spintronic devices.
△ Less
Submitted 11 November, 2015;
originally announced November 2015.
-
Optimization of Magnetic Refrigerators by Tuning the Heat Transfer Medium and Operating Conditions
Authors:
Mohammadreza Ghahremani,
Amir Aslani,
Lawrence H. Bennett,
Edward Della Torre
Abstract:
A new experimental test bed has been designed, built, and tested to evaluate the effect of the systems parameters on a reciprocating Active Magnetic Regenerator (AMR) near room temperature. Bulk gadolinium was used as the refrigerant, silicon oil as the heat transfer medium, and a magnetic field of 1.3 T was cycled. This study focuses on the methodology of single stage AMR operation conditions to…
▽ More
A new experimental test bed has been designed, built, and tested to evaluate the effect of the systems parameters on a reciprocating Active Magnetic Regenerator (AMR) near room temperature. Bulk gadolinium was used as the refrigerant, silicon oil as the heat transfer medium, and a magnetic field of 1.3 T was cycled. This study focuses on the methodology of single stage AMR operation conditions to get a higher temperature span near room temperature. Herein, the main objective is not to report the absolute maximum attainable temperature span seen in an AMR system, but rather to find the systems optimal operating conditions to reach that maximum span. The results of this research show that there is a optimal operating frequency, heat transfer fluid flow rate, flow duration, and displaced volume ratio in an AMR system. By optimizing these parameters the refrigeration performance increased by 24%. It is expected that such optimization will permit the design of a more efficient magnetic refrigeration system.
△ Less
Submitted 7 November, 2015;
originally announced November 2015.