-
ABHINAYA -- A System for Speech Emotion Recognition In Naturalistic Conditions Challenge
Authors:
Soumya Dutta,
Smruthi Balaji,
Varada R,
Viveka Salinamakki,
Sriram Ganapathy
Abstract:
Speech emotion recognition (SER) in naturalistic settings remains a challenge due to the intrinsic variability, diverse recording conditions, and class imbalance. As participants in the Interspeech Naturalistic SER Challenge which focused on these complexities, we present Abhinaya, a system integrating speech-based, text-based, and speech-text models. Our approach fine-tunes self-supervised and sp…
▽ More
Speech emotion recognition (SER) in naturalistic settings remains a challenge due to the intrinsic variability, diverse recording conditions, and class imbalance. As participants in the Interspeech Naturalistic SER Challenge which focused on these complexities, we present Abhinaya, a system integrating speech-based, text-based, and speech-text models. Our approach fine-tunes self-supervised and speech large language models (SLLM) for speech representations, leverages large language models (LLM) for textual context, and employs speech-text modeling with an SLLM to capture nuanced emotional cues. To combat class imbalance, we apply tailored loss functions and generate categorical decisions through majority voting. Despite one model not being fully trained, the Abhinaya system ranked 4th among 166 submissions. Upon completion of training, it achieved state-of-the-art performance among published results, demonstrating the effectiveness of our approach for SER in real-world conditions.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Audio-to-Audio Emotion Conversion With Pitch And Duration Style Transfer
Authors:
Soumya Dutta,
Avni Jain,
Sriram Ganapathy
Abstract:
Given a pair of source and reference speech recordings, audio-to-audio (A2A) style transfer involves the generation of an output speech that mimics the style characteristics of the reference while preserving the content and speaker attributes of the source. In this paper, we propose a novel framework, termed as A2A Zero-shot Emotion Style Transfer (A2A-ZEST), that enables the transfer of reference…
▽ More
Given a pair of source and reference speech recordings, audio-to-audio (A2A) style transfer involves the generation of an output speech that mimics the style characteristics of the reference while preserving the content and speaker attributes of the source. In this paper, we propose a novel framework, termed as A2A Zero-shot Emotion Style Transfer (A2A-ZEST), that enables the transfer of reference emotional attributes to the source while retaining its speaker and speech contents. The A2A-ZEST framework consists of an analysis-synthesis pipeline, where the analysis module decomposes speech into semantic tokens, speaker representations, and emotion embeddings. Using these representations, a pitch contour estimator and a duration predictor are learned. Further, a synthesis module is designed to generate speech based on the input representations and the derived factors. This entire paradigm of analysis-synthesis is trained purely in a self-supervised manner with an auto-encoding loss. For A2A emotion style transfer, the emotion embedding extracted from the reference speech along with the rest of the representations from the source speech are used in the synthesis module to generate the style translated speech. In our experiments, we evaluate the converted speech on content/speaker preservation (w.r.t. source) as well as on the effectiveness of the emotion style transfer (w.r.t. reference). The proposal, A2A-ZEST, is shown to improve over other prior works on these evaluations, thereby enabling style transfer without any parallel training data. We also illustrate the application of the proposed work for data augmentation in emotion recognition tasks.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
QUIET-SR: Quantum Image Enhancement Transformer for Single Image Super-Resolution
Authors:
Siddhant Dutta,
Nouhaila Innan,
Khadijeh Najafi,
Sadok Ben Yahia,
Muhammad Shafique
Abstract:
Recent advancements in Single-Image Super-Resolution (SISR) using deep learning have significantly improved image restoration quality. However, the high computational cost of processing high-resolution images due to the large number of parameters in classical models, along with the scalability challenges of quantum algorithms for image processing, remains a major obstacle. In this paper, we propos…
▽ More
Recent advancements in Single-Image Super-Resolution (SISR) using deep learning have significantly improved image restoration quality. However, the high computational cost of processing high-resolution images due to the large number of parameters in classical models, along with the scalability challenges of quantum algorithms for image processing, remains a major obstacle. In this paper, we propose the Quantum Image Enhancement Transformer for Super-Resolution (QUIET-SR), a hybrid framework that extends the Swin transformer architecture with a novel shifted quantum window attention mechanism, built upon variational quantum neural networks. QUIET-SR effectively captures complex residual mappings between low-resolution and high-resolution images, leveraging quantum attention mechanisms to enhance feature extraction and image restoration while requiring a minimal number of qubits, making it suitable for the Noisy Intermediate-Scale Quantum (NISQ) era. We evaluate our framework in MNIST (30.24 PSNR, 0.989 SSIM), FashionMNIST (29.76 PSNR, 0.976 SSIM) and the MedMNIST dataset collection, demonstrating that QUIET-SR achieves PSNR and SSIM scores comparable to state-of-the-art methods while using fewer parameters. These findings highlight the potential of scalable variational quantum machine learning models for SISR, marking a step toward practical quantum-enhanced image super-resolution.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Counterfactual Explanations for Model Ensembles Using Entropic Risk Measures
Authors:
Erfaun Noorani,
Pasan Dissanayake,
Faisal Hamman,
Sanghamitra Dutta
Abstract:
Counterfactual explanations indicate the smallest change in input that can translate to a different outcome for a machine learning model. Counterfactuals have generated immense interest in high-stakes applications such as finance, education, hiring, etc. In several use-cases, the decision-making process often relies on an ensemble of models rather than just one. Despite significant research on cou…
▽ More
Counterfactual explanations indicate the smallest change in input that can translate to a different outcome for a machine learning model. Counterfactuals have generated immense interest in high-stakes applications such as finance, education, hiring, etc. In several use-cases, the decision-making process often relies on an ensemble of models rather than just one. Despite significant research on counterfactuals for one model, the problem of generating a single counterfactual explanation for an ensemble of models has received limited interest. Each individual model might lead to a different counterfactual, whereas trying to find a counterfactual accepted by all models might significantly increase cost (effort). We propose a novel strategy to find the counterfactual for an ensemble of models using the perspective of entropic risk measure. Entropic risk is a convex risk measure that satisfies several desirable properties. We incorporate our proposed risk measure into a novel constrained optimization to generate counterfactuals for ensembles that stay valid for several models. The main significance of our measure is that it provides a knob that allows for the generation of counterfactuals that stay valid under an adjustable fraction of the models. We also show that a limiting case of our entropic-risk-based strategy yields a counterfactual valid for all models in the ensemble (worst-case min-max approach). We study the trade-off between the cost (effort) for the counterfactual and its validity for an ensemble by varying degrees of risk aversion, as determined by our risk parameter knob. We validate our performance on real-world datasets.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Authors:
Soumya Dutta,
Sriram Ganapathy
Abstract:
Emotion recognition in conversations (ERC) is challenging due to the multimodal nature of the emotion expression. In this paper, we propose to pretrain a text-based recognition model from unsupervised speech transcripts with LLM guidance. These transcriptions are obtained from a raw speech dataset with a pre-trained ASR system. A text LLM model is queried to provide pseudo-labels for these transcr…
▽ More
Emotion recognition in conversations (ERC) is challenging due to the multimodal nature of the emotion expression. In this paper, we propose to pretrain a text-based recognition model from unsupervised speech transcripts with LLM guidance. These transcriptions are obtained from a raw speech dataset with a pre-trained ASR system. A text LLM model is queried to provide pseudo-labels for these transcripts, and these pseudo-labeled transcripts are subsequently used for learning an utterance level text-based emotion recognition model. We use the utterance level text embeddings for emotion recognition in conversations along with speech embeddings obtained from a recently proposed pre-trained model. A hierarchical way of training the speech-text model is proposed, keeping in mind the conversational nature of the dataset. We perform experiments on three established datasets, namely, IEMOCAP, MELD, and CMU- MOSI, where we illustrate that the proposed model improves over other benchmarks and achieves state-of-the-art results on two out of these three datasets.
△ Less
Submitted 20 January, 2025;
originally announced January 2025.
-
Private Counterfactual Retrieval With Immutable Features
Authors:
Shreya Meel,
Pasan Dissanayake,
Mohamed Nomeir,
Sanghamitra Dutta,
Sennur Ulukus
Abstract:
In a classification task, counterfactual explanations provide the minimum change needed for an input to be classified into a favorable class. We consider the problem of privately retrieving the exact closest counterfactual from a database of accepted samples while enforcing that certain features of the input sample cannot be changed, i.e., they are \emph{immutable}. An applicant (user) whose featu…
▽ More
In a classification task, counterfactual explanations provide the minimum change needed for an input to be classified into a favorable class. We consider the problem of privately retrieving the exact closest counterfactual from a database of accepted samples while enforcing that certain features of the input sample cannot be changed, i.e., they are \emph{immutable}. An applicant (user) whose feature vector is rejected by a machine learning model wants to retrieve the sample closest to them in the database without altering a private subset of their features, which constitutes the immutable set. While doing this, the user should keep their feature vector, immutable set and the resulting counterfactual index information-theoretically private from the institution. We refer to this as immutable private counterfactual retrieval (I-PCR) problem which generalizes PCR to a more practical setting. In this paper, we propose two I-PCR schemes by leveraging techniques from private information retrieval (PIR) and characterize their communication costs. Further, we quantify the information that the user learns about the database and compare it for the proposed schemes.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
Quantifying Knowledge Distillation Using Partial Information Decomposition
Authors:
Pasan Dissanayake,
Faisal Hamman,
Barproda Halder,
Ilia Sucholutsky,
Qiuyi Zhang,
Sanghamitra Dutta
Abstract:
Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher's representations can also encode nuisance or additional information not relevant to the downstream task. Distilling such irrelevant information can actually impede the performanc…
▽ More
Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher's representations can also encode nuisance or additional information not relevant to the downstream task. Distilling such irrelevant information can actually impede the performance of a capacity-limited student model. This observation motivates our primary question: What are the information-theoretic limits of knowledge distillation? To this end, we leverage Partial Information Decomposition to quantify and explain the transferred knowledge and knowledge left to distill for a downstream task. We theoretically demonstrate that the task-relevant transferred knowledge is succinctly captured by the measure of redundant information about the task between the teacher and student. We propose a novel multi-level optimization to incorporate redundant information as a regularizer, leading to our framework of Redundant Information Distillation (RID). RID leads to more resilient and effective distillation under nuisance teachers as it succinctly quantifies task-relevant knowledge rather than simply aligning student and teacher representations.
△ Less
Submitted 4 April, 2025; v1 submitted 11 November, 2024;
originally announced November 2024.
-
Private Counterfactual Retrieval
Authors:
Mohamed Nomeir,
Pasan Dissanayake,
Shreya Meel,
Sanghamitra Dutta,
Sennur Ulukus
Abstract:
Transparency and explainability are two extremely important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of catering this requirement. However, this also poses a threat to the privacy of both the institution that is providing the explanation as well as the user who is requesting it. In this wo…
▽ More
Transparency and explainability are two extremely important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of catering this requirement. However, this also poses a threat to the privacy of both the institution that is providing the explanation as well as the user who is requesting it. In this work, we propose multiple schemes inspired by private information retrieval (PIR) techniques which ensure the \emph{user's privacy} when retrieving counterfactual explanations. We present a scheme which retrieves the \emph{exact} nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect (information-theoretic) privacy for the user. While the scheme achieves perfect privacy for the user, some leakage on the database is inevitable which we quantify using a mutual information based metric. Furthermore, we propose strategies to reduce this leakage to achieve an advanced degree of database privacy. We extend these schemes to incorporate user's preference on transforming their attributes, so that a more actionable explanation can be received. Since our schemes rely on finite field arithmetic, we empirically validate our schemes on real datasets to understand the trade-off between the accuracy and the finite field sizes.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Leveraging Content and Acoustic Representations for Speech Emotion Recognition
Authors:
Soumya Dutta,
Sriram Ganapathy
Abstract:
Speech emotion recognition (SER), the task of identifying the expression of emotion from spoken content, is challenging due to the difficulty in extracting representations that capture emotional attributes from speech. The scarcity of labeled datasets further complicates the challenge where large models are prone to over-fitting. In this paper, we propose CARE (Content and Acoustic Representations…
▽ More
Speech emotion recognition (SER), the task of identifying the expression of emotion from spoken content, is challenging due to the difficulty in extracting representations that capture emotional attributes from speech. The scarcity of labeled datasets further complicates the challenge where large models are prone to over-fitting. In this paper, we propose CARE (Content and Acoustic Representations of Emotions), where we design a dual encoding scheme which emphasizes semantic and acoustic factors of speech. While the semantic encoder is trained using distillation from utterance-level text representations, the acoustic encoder is trained to predict low-level frame-wise features of the speech signal. The proposed dual encoding scheme is a base-sized model trained only on unsupervised raw speech. With a simple light-weight classification model trained on the downstream task, we show that the CARE embeddings provide effective emotion recognition on a variety of datasets. We compare the proposal with several other self-supervised models as well as recent large-language model based approaches. In these evaluations, the proposed CARE is shown to be the best performing model based on average performance across 8 diverse datasets. We also conduct several ablation studies to analyze the importance of various design choices.
△ Less
Submitted 17 December, 2024; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Navigating the United States Legislative Landscape on Voice Privacy: Existing Laws, Proposed Bills, Protection for Children, and Synthetic Data for AI
Authors:
Satwik Dutta,
John H. L. Hansen
Abstract:
Privacy is a hot topic for policymakers across the globe, including the United States. Evolving advances in AI and emerging concerns about the misuse of personal data have pushed policymakers to draft legislation on trustworthy AI and privacy protection for its citizens. This paper presents the state of the privacy legislation at the U.S. Congress and outlines how voice data is considered as part…
▽ More
Privacy is a hot topic for policymakers across the globe, including the United States. Evolving advances in AI and emerging concerns about the misuse of personal data have pushed policymakers to draft legislation on trustworthy AI and privacy protection for its citizens. This paper presents the state of the privacy legislation at the U.S. Congress and outlines how voice data is considered as part of the legislation definition. This paper also reviews additional privacy protection for children. This paper presents a holistic review of enacted and proposed privacy laws, and consideration for voice data, including guidelines for processing children's data, in those laws across the fifty U.S. states. As a groundbreaking alternative to actual human data, ethically generated synthetic data allows much flexibility to keep AI innovation in progress. Given the consideration of synthetic data in AI legislation by policymakers to be relatively new, as compared to that of privacy laws, this paper reviews regulatory considerations for synthetic data.
△ Less
Submitted 28 July, 2024;
originally announced July 2024.
-
A novel perspective on denoising using quantum localization with application to medical imaging
Authors:
Amirreza Hashemi,
Sayantan Dutta,
Bertrand Georgeot,
Denis Kouame,
Hamid Sabet
Abstract:
Background noise in many fields such as medical imaging poses significant challenges for accurate diagnosis, prompting the development of denoising algorithms. Traditional methodologies, however, often struggle to address the complexities of noisy environments in high dimensional imaging systems. This paper introduces a novel quantum-inspired approach for image denoising, drawing upon principles o…
▽ More
Background noise in many fields such as medical imaging poses significant challenges for accurate diagnosis, prompting the development of denoising algorithms. Traditional methodologies, however, often struggle to address the complexities of noisy environments in high dimensional imaging systems. This paper introduces a novel quantum-inspired approach for image denoising, drawing upon principles of quantum and condensed matter physics. Our approach views medical images as amorphous structures akin to those found in condensed matter physics and we propose an algorithm that incorporates the concept of mode resolved localization directly into the denoising process. Notably, unlike previous studies that considered localization as a hindrance, our approach considers quantum localization as a fundamental component of image reconstruction which is used to differentiate between noisy and non-noisy modes based on diffusivity and localization measurements. This perspective eliminates the need for hyperparameter tuning, making the proposed method a standalone algorithm which can be implemented with minimal manual intervention and can perform automatic filtering of noise regardless of noise level. Through numerical validation, we showcase the effectiveness of our approach in addressing noise-related challenges in imaging and especially medical imaging, underscoring its relevance for possible quantum computing applications.
△ Less
Submitted 30 January, 2025; v1 submitted 22 April, 2024;
originally announced May 2024.
-
Few Shot Class Incremental Learning using Vision-Language models
Authors:
Anurag Kumar,
Chinmay Bharti,
Saikat Dutta,
Srikrishna Karanam,
Biplab Banerjee
Abstract:
Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The cha…
▽ More
Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL).
In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model's acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.
△ Less
Submitted 15 August, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
Design and Simulation of Time-energy Optimal Anti-swing Trajectory Planner for Autonomous Tower Cranes
Authors:
Souravik Dutta,
Yiyu Cai
Abstract:
For autonomous crane lifting, optimal trajectories of the crane are required as reference inputs to the crane controller to facilitate feedforward control. Reducing the unactuated payload motion is a crucial issue for under-actuated tower cranes with spherical pendulum dynamics. The planned trajectory should be optimal in terms of both operating time and energy consumption, to facilitate optimum o…
▽ More
For autonomous crane lifting, optimal trajectories of the crane are required as reference inputs to the crane controller to facilitate feedforward control. Reducing the unactuated payload motion is a crucial issue for under-actuated tower cranes with spherical pendulum dynamics. The planned trajectory should be optimal in terms of both operating time and energy consumption, to facilitate optimum output spending optimum effort. This article proposes an anti-swing tower crane trajectory planner that can provide time-energy optimal solutions for the Computer-Aided Lift Planning (CALP) system developed at Nanyang Technological University, which facilitates collision-free lifting path planning of robotized tower cranes in autonomous construction sites. The current work introduces a trajectory planning module to the system that utilizes the geometric outputs from the path planning module and optimally scales them with time information. Firstly, analyzing the non-linear dynamics of the crane operations, the tower crane is established as differentially flat. Subsequently, the multi-objective trajectory optimization problems for all the crane operations are formulated in the flat output space through consideration of the mechanical and safety constraints. Two multi-objective evolutionary algorithms, namely Non-dominated Sorting Genetic Algorithm (NSGA-II) and Generalized Differential Evolution 3 (GDE3), are extensively compared via statistical measures based on the closeness of solutions to the Pareto front, distribution of solutions in the solution space and the runtime, to select the optimization engine of the planner. Finally, the crane operation trajectories are obtained via the corresponding planned flat output trajectories. Studies simulating real-world lifting scenarios are conducted to verify the effectiveness and reliability of the proposed module of the lift planning system.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Automatic Tuning of Denoising Algorithms Parameters Without Ground Truth
Authors:
Arthur Floquet,
Sayantan Dutta,
Emmanuel Soubies,
Duong Hung Pham,
Denis Kouame
Abstract:
Denoising is omnipresent in image processing. It is usually addressed with algorithms relying on a set of hyperparameters that control the quality of the recovered image. Manual tuning of those parameters can be a daunting task, which calls for the development of automatic tuning methods. Given a denoising algorithm, the best set of parameters is the one that minimizes the error between denoised a…
▽ More
Denoising is omnipresent in image processing. It is usually addressed with algorithms relying on a set of hyperparameters that control the quality of the recovered image. Manual tuning of those parameters can be a daunting task, which calls for the development of automatic tuning methods. Given a denoising algorithm, the best set of parameters is the one that minimizes the error between denoised and ground-truth images. Clearly, this ideal approach is unrealistic, as the ground-truth images are unknown in practice. In this work, we propose unsupervised cost functions -- i.e., that only require the noisy image -- that allow us to reach this ideal gold standard performance. Specifically, the proposed approach makes it possible to obtain an average PSNR output within less than 1% of the best achievable PSNR.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Authors:
Soumya Dutta,
Sriram Ganapathy
Abstract:
The problem of audio-to-audio (A2A) style transfer involves replacing the style features of the source audio with those from the target audio while preserving the content related attributes of the source audio. In this paper, we propose an efficient approach, termed as Zero-shot Emotion Style Transfer (ZEST), that allows the transfer of emotional content present in the given source audio with the…
▽ More
The problem of audio-to-audio (A2A) style transfer involves replacing the style features of the source audio with those from the target audio while preserving the content related attributes of the source audio. In this paper, we propose an efficient approach, termed as Zero-shot Emotion Style Transfer (ZEST), that allows the transfer of emotional content present in the given source audio with the one embedded in the target audio while retaining the speaker and speech content from the source. The proposed system builds upon decomposing speech into semantic tokens, speaker representations and emotion embeddings. Using these factors, we propose a framework to reconstruct the pitch contour of the given speech signal and train a decoder that reconstructs the speech signal. The model is trained using a self-supervision based reconstruction loss. During conversion, the emotion embedding is alone derived from the target audio, while rest of the factors are derived from the source audio. In our experiments, we show that, even without using parallel training data or labels from the source or target audio, we illustrate zero shot emotion transfer capabilities of the proposed ZEST model using objective and subjective quality evaluations.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Quantum Algorithm for Signal Denoising
Authors:
Sayantan Dutta,
Adrian Basarab,
Denis Kouamé,
Bertrand Georgeot
Abstract:
This letter presents a novel \textit{quantum algorithm} for signal denoising, which performs a thresholding in the frequency domain through amplitude amplification and using an adaptive threshold determined by local mean values. The proposed algorithm is able to process \textit{both classical and quantum} signals. It is parametrically faster than previous classical and quantum denoising algorithms…
▽ More
This letter presents a novel \textit{quantum algorithm} for signal denoising, which performs a thresholding in the frequency domain through amplitude amplification and using an adaptive threshold determined by local mean values. The proposed algorithm is able to process \textit{both classical and quantum} signals. It is parametrically faster than previous classical and quantum denoising algorithms. Numerical results show that it is efficient at removing noise of both classical and quantum origin, significantly outperforming existing quantum algorithms in this respect, especially in the presence of quantum noise.
△ Less
Submitted 24 December, 2023;
originally announced December 2023.
-
Exploring the Emotional Landscape of Music: An Analysis of Valence Trends and Genre Variations in Spotify Music Data
Authors:
Shruti Dutta,
Shashwat Mookherjee
Abstract:
This paper conducts an intricate analysis of musical emotions and trends using Spotify music data, encompassing audio features and valence scores extracted through the Spotipi API. Employing regression modeling, temporal analysis, mood transitions, and genre investigation, the study uncovers patterns within music-emotion relationships. Regression models linear, support vector, random forest, and r…
▽ More
This paper conducts an intricate analysis of musical emotions and trends using Spotify music data, encompassing audio features and valence scores extracted through the Spotipi API. Employing regression modeling, temporal analysis, mood transitions, and genre investigation, the study uncovers patterns within music-emotion relationships. Regression models linear, support vector, random forest, and ridge, are employed to predict valence scores. Temporal analysis reveals shifts in valence distribution over time, while mood transition exploration illuminates emotional dynamics within playlists. The research contributes to nuanced insights into music's emotional fabric, enhancing comprehension of the interplay between music and emotions through years.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
A Centralized Voltage Controller for Offshore Wind Plants: NY State Grid Case Study
Authors:
Lin Zhu,
Bruno Leonardi,
Aboutaleb Haddadi,
Sudipta Dutta,
Alberto Del Rosso,
Victor Paduani,
Hossein Hooshyar
Abstract:
This paper proposes a centralized multi-plant reactive power and voltage controller to support voltage control in the interconnected onshore power system. This controller utilizes a hierarchical control structure consisting of a master controller and multiple slave controllers. To validate the proposed method, a realistic planning case of the New York State grid is created for the year 2035, in wh…
▽ More
This paper proposes a centralized multi-plant reactive power and voltage controller to support voltage control in the interconnected onshore power system. This controller utilizes a hierarchical control structure consisting of a master controller and multiple slave controllers. To validate the proposed method, a realistic planning case of the New York State grid is created for the year 2035, in which nearly 9,500 MW AC or DC connected offshore wind resources are modeled. The performance of the proposed controller is analyzed in the large-scale model under three realistic disturbance scenarios: generator loss, load ramps, and load steps. Results demonstrate how the controller can adequately perform under disturbances to share reactive support proportionally among plants based on their ratings and improve grid voltage stability margins.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Automated Identification of Failure Cases in Organ at Risk Segmentation Using Distance Metrics: A Study on CT Data
Authors:
Amin Honarmandi Shandiz,
Attila Rádics,
Rajesh Tamada,
Makk Árpád,
Karolina Glowacka,
Lehel Ferenczi,
Sandeep Dutta,
Michael Fanariotis
Abstract:
Automated organ at risk (OAR) segmentation is crucial for radiation therapy planning in CT scans, but the generated contours by automated models can be inaccurate, potentially leading to treatment planning issues. The reasons for these inaccuracies could be varied, such as unclear organ boundaries or inaccurate ground truth due to annotation errors. To improve the model's performance, it is necess…
▽ More
Automated organ at risk (OAR) segmentation is crucial for radiation therapy planning in CT scans, but the generated contours by automated models can be inaccurate, potentially leading to treatment planning issues. The reasons for these inaccuracies could be varied, such as unclear organ boundaries or inaccurate ground truth due to annotation errors. To improve the model's performance, it is necessary to identify these failure cases during the training process and to correct them with some potential post-processing techniques. However, this process can be time-consuming, as traditionally it requires manual inspection of the predicted output. This paper proposes a method to automatically identify failure cases by setting a threshold for the combination of Dice and Hausdorff distances. This approach reduces the time-consuming task of visually inspecting predicted outputs, allowing for faster identification of failure case candidates. The method was evaluated on 20 cases of six different organs in CT images from clinical expert curated datasets. By setting the thresholds for the Dice and Hausdorff distances, the study was able to differentiate between various states of failure cases and evaluate over 12 cases visually. This thresholding approach could be extended to other organs, leading to faster identification of failure cases and thereby improving the quality of radiation therapy planning.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Multi-objective Anti-swing Trajectory Planning of Double-pendulum Tower Crane Operations using Opposition-based Evolutionary Algorithm
Authors:
Souravik Dutta,
Yiyu Cai,
Jianmin Zheng
Abstract:
Underactuated tower crane lifting requires time-energy optimal trajectories for the trolley/slew operations and reduction of the unactuated swings resulting from the trolley/jib motion. In scenarios involving non-negligible hook mass or long rig-cable, the hook-payload unit exhibits double-pendulum behaviour, making the problem highly challenging. This article introduces an offline multi-objective…
▽ More
Underactuated tower crane lifting requires time-energy optimal trajectories for the trolley/slew operations and reduction of the unactuated swings resulting from the trolley/jib motion. In scenarios involving non-negligible hook mass or long rig-cable, the hook-payload unit exhibits double-pendulum behaviour, making the problem highly challenging. This article introduces an offline multi-objective anti-swing trajectory planning module for a Computer-Aided Lift Planning (CALP) system of autonomous double-pendulum tower cranes, addressing all the transient state constraints. A set of auxiliary outputs are selected by methodically analyzing the payload swing dynamics and are used to prove the differential flatness property of the crane operations. The flat outputs are parameterized via suitable Bézier curves to formulate the multi-objective trajectory optimization problems in the flat output space. A novel multi-objective evolutionary algorithm called Collective Oppositional Generalized Differential Evolution 3 (CO-GDE3) is employed as the optimizer. To obtain faster convergence and better consistency in getting a wide range of good solutions, a new population initialization strategy is integrated into the conventional GDE3. The computationally efficient initialization method incorporates various concepts of computational opposition. Statistical comparisons based on trolley and slew operations verify the superiority of convergence and reliability of CO-GDE3 over the standard GDE3. Trolley and slew operations of a collision-free lifting path computed via the path planner of the CALP system are selected for a simulation study. The simulated trajectories demonstrate that the proposed planner can produce time-energy optimal solutions, keeping all the state variables within their respective limits and restricting the hook and payload swings.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
HCAM -- Hierarchical Cross Attention Model for Multi-modal Emotion Recognition
Authors:
Soumya Dutta,
Sriram Ganapathy
Abstract:
Emotion recognition in conversations is challenging due to the multi-modal nature of the emotion expression. We propose a hierarchical cross-attention model (HCAM) approach to multi-modal emotion recognition using a combination of recurrent and co-attention neural network models. The input to the model consists of two modalities, i) audio data, processed through a learnable wav2vec approach and, i…
▽ More
Emotion recognition in conversations is challenging due to the multi-modal nature of the emotion expression. We propose a hierarchical cross-attention model (HCAM) approach to multi-modal emotion recognition using a combination of recurrent and co-attention neural network models. The input to the model consists of two modalities, i) audio data, processed through a learnable wav2vec approach and, ii) text data represented using a bidirectional encoder representations from transformers (BERT) model. The audio and text representations are processed using a set of bi-directional recurrent neural network layers with self-attention that converts each utterance in a given conversation to a fixed dimensional embedding. In order to incorporate contextual knowledge and the information across the two modalities, the audio and text embeddings are combined using a co-attention layer that attempts to weigh the utterance level embeddings relevant to the task of emotion recognition. The neural network parameters in the audio layers, text layers as well as the multi-modal co-attention layers, are hierarchically trained for the emotion classification task. We perform experiments on three established datasets namely, IEMOCAP, MELD and CMU-MOSI, where we illustrate that the proposed model improves significantly over other benchmarks and helps achieve state-of-art results on all these datasets.
△ Less
Submitted 9 January, 2024; v1 submitted 13 April, 2023;
originally announced April 2023.
-
A Unified Approach to Optimally Solving Sensor Scheduling and Sensor Selection Problems in Kalman Filtering
Authors:
Shamak Dutta,
Nils Wilde,
Stephen L. Smith
Abstract:
We consider a general form of the sensor scheduling problem for state estimation of linear dynamical systems, which involves selecting sensors that minimize the trace of the Kalman filter error covariance (weighted by a positive semidefinite matrix) subject to polyhedral constraints on the selected sensors. This general form captures several well-studied problems including sensor placement, sensor…
▽ More
We consider a general form of the sensor scheduling problem for state estimation of linear dynamical systems, which involves selecting sensors that minimize the trace of the Kalman filter error covariance (weighted by a positive semidefinite matrix) subject to polyhedral constraints on the selected sensors. This general form captures several well-studied problems including sensor placement, sensor scheduling with budget constraints, and Linear Quadratic Gaussian (LQG) control and sensing co-design. We present a mixed integer optimization approach that is derived by exploiting the optimality of the Kalman filter. While existing work has focused on approximate methods to specific problem variants, our work provides a unified approach to computing optimal solutions to the general version of sensor scheduling. In simulation, we show this approach finds optimal solutions for systems with 30 to 50 states in seconds.
△ Less
Submitted 11 December, 2023; v1 submitted 5 April, 2023;
originally announced April 2023.
-
DIVA: Deep Unfolded Network from Quantum Interactive Patches for Image Restoration
Authors:
Sayantan Dutta,
Adrian Basarab,
Bertrand Georgeot,
Denis Kouamé
Abstract:
This paper presents a deep neural network called DIVA unfolding a baseline adaptive denoising algorithm (De-QuIP), relying on the theory of quantum many-body physics. Furthermore, it is shown that with very slight modifications, this network can be enhanced to solve more challenging image restoration tasks such as image deblurring, super-resolution and inpainting. Despite a compact and interpretab…
▽ More
This paper presents a deep neural network called DIVA unfolding a baseline adaptive denoising algorithm (De-QuIP), relying on the theory of quantum many-body physics. Furthermore, it is shown that with very slight modifications, this network can be enhanced to solve more challenging image restoration tasks such as image deblurring, super-resolution and inpainting. Despite a compact and interpretable (from a physical perspective) architecture, the proposed deep learning network outperforms several recent algorithms from the literature, designed specifically for each task. The key ingredients of the proposed method are on one hand, its ability to handle non-local image structures through the patch-interaction term and the quantum-based Hamiltonian operator, and, on the other hand, its flexibility to adapt the hyperparameters patch-wisely, due to the training process.
△ Less
Submitted 31 December, 2022;
originally announced January 2023.
-
Design of a Strong-Arm Dynamic-Latch based comparator with high speed, low power and low offset for SAR-ADC
Authors:
Sounak Dutta
Abstract:
Comparators are utilised by Nyquist-rate and oversampling analog to digital converters (ADCs) to accomplish quantization and perhaps sampling. Thus, comparators have a substantial effect on the speed and accuracy of ADCs. This study provides a revised design for a dynamic-latch-based comparator that achieves the lowest latency, maximum area-efficient realisation, reduced power dissipation, and low…
▽ More
Comparators are utilised by Nyquist-rate and oversampling analog to digital converters (ADCs) to accomplish quantization and perhaps sampling. Thus, comparators have a substantial effect on the speed and accuracy of ADCs. This study provides a revised design for a dynamic-latch-based comparator that achieves the lowest latency, maximum area-efficient realisation, reduced power dissipation, and low offset. The proposed circuit has been designed and simulated using GDPK 45 nm standard CMOS-Process to operate on 100 MHz clock, at 1.2V supply voltage. Design and simulation have been carried out using CADENCE Virtuoso EDA tool. Compared to the original design, the PDP was easily reduced by approximately by 6% with offset voltage reduced by 8 mV without speed trade-off.
△ Less
Submitted 25 October, 2022; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Informative Path Planning in Random Fields via Mixed Integer Programming
Authors:
Shamak Dutta,
Nils Wilde,
Stephen L. Smith
Abstract:
We present a new mixed integer formulation for the discrete informative path planning problem in random fields. The objective is to compute a budget constrained path while collecting measurements whose linear estimate results in minimum error over a finite set of prediction locations. The problem is known to be NP-hard. However, we strive to compute optimal solutions by leveraging advances in mixe…
▽ More
We present a new mixed integer formulation for the discrete informative path planning problem in random fields. The objective is to compute a budget constrained path while collecting measurements whose linear estimate results in minimum error over a finite set of prediction locations. The problem is known to be NP-hard. However, we strive to compute optimal solutions by leveraging advances in mixed integer optimization. Our approach is based on expanding the search space so we optimize not only over the collected measurement subset, but also over the class of all linear estimators. This allows us to formulate a mixed integer quadratic program that is convex in the continuous variables. The formulations are general and are not restricted to any covariance structure of the field. In simulations, we demonstrate the effectiveness of our approach over previous branch and bound algorithms.
△ Less
Submitted 20 April, 2022;
originally announced April 2022.
-
Can No-reference features help in Full-reference image quality estimation?
Authors:
Saikat Dutta,
Sourya Dipta Das,
Nisarg A. Shah
Abstract:
Development of perceptual image quality assessment (IQA) metrics has been of significant interest to computer vision community. The aim of these metrics is to model quality of an image as perceived by humans. Recent works in Full-reference IQA research perform pixelwise comparison between deep features corresponding to query and reference images for quality prediction. However, pixelwise feature c…
▽ More
Development of perceptual image quality assessment (IQA) metrics has been of significant interest to computer vision community. The aim of these metrics is to model quality of an image as perceived by humans. Recent works in Full-reference IQA research perform pixelwise comparison between deep features corresponding to query and reference images for quality prediction. However, pixelwise feature comparison may not be meaningful if distortion present in query image is severe. In this context, we explore utilization of no-reference features in Full-reference IQA task. Our model consists of both full-reference and no-reference branches. Full-reference branches use both distorted and reference images, whereas No-reference branch only uses distorted image. Our experiments show that use of no-reference features boosts performance of image quality assessment. Our model achieves higher SRCC and KRCC scores than a number of state-of-the-art algorithms on KADID-10K and PIPAL datasets.
△ Less
Submitted 1 March, 2022;
originally announced March 2022.
-
A Novel Image Denoising Algorithm Using Concepts of Quantum Many-Body Theory
Authors:
Sayantan Dutta,
Adrian Basarab,
Bertrand Georgeot,
Denis Kouamé
Abstract:
Sparse representation of real-life images is a very effective approach in imaging applications, such as denoising. In recent years, with the growth of computing power, data-driven strategies exploiting the redundancy within patches extracted from one or several images to increase sparsity have become more prominent. This paper presents a novel image denoising algorithm exploiting such an image-dep…
▽ More
Sparse representation of real-life images is a very effective approach in imaging applications, such as denoising. In recent years, with the growth of computing power, data-driven strategies exploiting the redundancy within patches extracted from one or several images to increase sparsity have become more prominent. This paper presents a novel image denoising algorithm exploiting such an image-dependent basis inspired by the quantum many-body theory. Based on patch analysis, the similarity measures in a local image neighborhood are formalized through a term akin to interaction in quantum mechanics that can efficiently preserve the local structures of real images. The versatile nature of this adaptive basis extends the scope of its application to image-independent or image-dependent noise scenarios without any adjustment. We carry out a rigorous comparison with contemporary methods to demonstrate the denoising capability of the proposed algorithm regardless of the image characteristics, noise statistics and intensity. We illustrate the properties of the hyperparameters and their respective effects on the denoising performance, together with automated rules of selecting their values close to the optimal one in experimental setups with ground truth not available. Finally, we show the ability of our approach to deal with practical images denoising problems such as medical ultrasound image despeckling applications.
△ Less
Submitted 24 August, 2022; v1 submitted 16 December, 2021;
originally announced December 2021.
-
Image Denoising Inspired by Quantum Many-Body physics
Authors:
Sayantan Dutta,
Adrian Basarab,
Bertrand Georgeot,
Denis Kouamé
Abstract:
Decomposing an image through Fourier, DCT or wavelet transforms is still a common approach in digital image processing, in number of applications such as denoising. In this context, data-driven dictionaries and in particular exploiting the redundancy withing patches extracted from one or several images allowed important improvements. This paper proposes an original idea of constructing such an ima…
▽ More
Decomposing an image through Fourier, DCT or wavelet transforms is still a common approach in digital image processing, in number of applications such as denoising. In this context, data-driven dictionaries and in particular exploiting the redundancy withing patches extracted from one or several images allowed important improvements. This paper proposes an original idea of constructing such an image-dependent basis inspired by the principles of quantum many-body physics. The similarity between two image patches is introduced in the formalism through a term akin to interaction terms in quantum mechanics. The main contribution of the paper is thus to introduce this original way of exploiting quantum many-body ideas in image processing, which opens interesting perspectives in image denoising. The potential of the proposed adaptive decomposition is illustrated through image denoising in presence of additive white Gaussian noise, but the method can be used for other types of noise such as image-dependent noise as well. Finally, the results show that our method achieves comparable or slightly better results than existing approaches.
△ Less
Submitted 31 August, 2021;
originally announced August 2021.
-
Plug-and-Play Quantum Adaptive Denoiser for Deconvolving Poisson Noisy Images
Authors:
Sayantan Dutta,
Adrian Basarab,
Bertrand Georgeot,
Denis Kouamé
Abstract:
A new Plug-and-Play (PnP) alternating direction of multipliers (ADMM) scheme is proposed in this paper, by embedding a recently introduced adaptive denoiser using the Schroedinger equation's solutions of quantum physics. The potential of the proposed model is studied for Poisson image deconvolution, which is a common problem occurring in number of imaging applications, such as limited photon acqui…
▽ More
A new Plug-and-Play (PnP) alternating direction of multipliers (ADMM) scheme is proposed in this paper, by embedding a recently introduced adaptive denoiser using the Schroedinger equation's solutions of quantum physics. The potential of the proposed model is studied for Poisson image deconvolution, which is a common problem occurring in number of imaging applications, such as limited photon acquisition or X-ray computed tomography. Numerical results show the efficiency and good adaptability of the proposed scheme compared to recent state-of-the-art techniques, for both high and low signal-to-noise ratio scenarios. This performance gain regardless of the amount of noise affecting the observations is explained by the flexibility of the embedded quantum denoiser constructed without anticipating any prior statistics about the noise, which is one of the main advantages of this method. The main novelty of this work resided in the integration of a modified quantum denoiser into the PnP-ADMM framework and the numerical proof of convergence of the resulting algorithm.
△ Less
Submitted 20 October, 2021; v1 submitted 1 July, 2021;
originally announced July 2021.
-
Fast and Accurate Quantized Camera Scene Detection on Smartphones, Mobile AI 2021 Challenge: Report
Authors:
Andrey Ignatov,
Grigory Malivenko,
Radu Timofte,
Sheng Chen,
Xin Xia,
Zhaoyan Liu,
Yuwei Zhang,
Feng Zhu,
Jiashi Li,
Xuefeng Xiao,
Yuan Tian,
Xinglong Wu,
Christos Kyrkou,
Yixin Chen,
Zexin Zhang,
Yunbo Peng,
Yue Lin,
Saikat Dutta,
Sourya Dipta Das,
Nisarg A. Shah,
Himanshu Kumar,
Chao Ge,
Pei-Lin Wu,
Jin-Hua Du,
Andrew Batutin
, et al. (6 additional authors not shown)
Abstract:
Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions th…
▽ More
Camera scene detection is among the most popular computer vision problem on smartphones. While many custom solutions were developed for this task by phone vendors, none of the designed models were available publicly up until now. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop quantized deep learning-based camera scene classification solutions that can demonstrate a real-time performance on smartphones and IoT platforms. For this, the participants were provided with a large-scale CamSDD dataset consisting of more than 11K images belonging to the 30 most important scene categories. The runtime of all models was evaluated on the popular Apple Bionic A11 platform that can be found in many iOS devices. The proposed solutions are fully compatible with all major mobile AI accelerators and can demonstrate more than 100-200 FPS on the majority of recent smartphone platforms while achieving a top-3 accuracy of more than 98%. A detailed description of all models developed in the challenge is provided in this paper.
△ Less
Submitted 17 May, 2021;
originally announced May 2021.
-
Efficient Space-time Video Super Resolution using Low-Resolution Flow and Mask Upsampling
Authors:
Saikat Dutta,
Nisarg A. Shah,
Anurag Mittal
Abstract:
This paper explores an efficient solution for Space-time Super-Resolution, aiming to generate High-resolution Slow-motion videos from Low Resolution and Low Frame rate videos. A simplistic solution is the sequential running of Video Super Resolution and Video Frame interpolation models. However, this type of solutions are memory inefficient, have high inference time, and could not make the proper…
▽ More
This paper explores an efficient solution for Space-time Super-Resolution, aiming to generate High-resolution Slow-motion videos from Low Resolution and Low Frame rate videos. A simplistic solution is the sequential running of Video Super Resolution and Video Frame interpolation models. However, this type of solutions are memory inefficient, have high inference time, and could not make the proper use of space-time relation property. To this extent, we first interpolate in LR space using quadratic modeling. Input LR frames are super-resolved using a state-of-the-art Video Super-Resolution method. Flowmaps and blending mask which are used to synthesize LR interpolated frame is reused in HR space using bilinear upsampling. This leads to a coarse estimate of HR intermediate frame which often contains artifacts along motion boundaries. We use a refinement network to improve the quality of HR intermediate frame via residual learning. Our model is lightweight and performs better than current state-of-the-art models in REDS STSR Validation set.
△ Less
Submitted 8 June, 2021; v1 submitted 12 April, 2021;
originally announced April 2021.
-
AIM 2020 Challenge on Rendering Realistic Bokeh
Authors:
Andrey Ignatov,
Radu Timofte,
Ming Qian,
Congyu Qiao,
Jiamin Lin,
Zhenyu Guo,
Chenghua Li,
Cong Leng,
Jian Cheng,
Juewen Peng,
Xianrui Luo,
Ke Xian,
Zijin Wu,
Zhiguo Cao,
Densen Puthussery,
Jiji C V,
Hrishikesh P S,
Melvin Kuriakose,
Saikat Dutta,
Sourya Dipta Das,
Nisarg A. Shah,
Kuldeep Purohit,
Praveen Kandula,
Maitreya Suin,
A. N. Rajagopalan
, et al. (10 additional authors not shown)
Abstract:
This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using th…
▽ More
This paper reviews the second AIM realistic bokeh effect rendering challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world bokeh simulation problem, where the goal was to learn a realistic shallow focus technique using a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The participants had to render bokeh effect based on only one single frame without any additional data from other cameras or sensors. The target metric used in this challenge combined the runtime and the perceptual quality of the solutions measured in the user study. To ensure the efficiency of the submitted models, we measured their runtime on standard desktop CPUs as well as were running the models on smartphone GPUs. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical bokeh effect rendering problem.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Poisson Image Deconvolution by a Plug-and-Play Quantum Denoising Scheme
Authors:
Sayantan Dutta,
Adrian Basarab,
Bertrand Georgeot,
Denis Kouamé
Abstract:
This paper introduces a new Plug-and-Play (PnP) alternating direction of multipliers (ADMM) scheme based on a recently proposed denoiser using the Schroedinger equation's solutions of quantum physics. The efficiency of the proposed algorithm is evaluated for Poisson image deconvolution, which is very common for imaging applications, such as, for example, limited photon acquisition. Numerical resul…
▽ More
This paper introduces a new Plug-and-Play (PnP) alternating direction of multipliers (ADMM) scheme based on a recently proposed denoiser using the Schroedinger equation's solutions of quantum physics. The efficiency of the proposed algorithm is evaluated for Poisson image deconvolution, which is very common for imaging applications, such as, for example, limited photon acquisition. Numerical results show the superiority of the proposed scheme compared to recent state-of-the-art techniques, for both low and high signal-to-noise-ratio scenarios. This performance gain is mostly explained by the flexibility of the embedded quantum denoiser for different types of noise affecting the observations.
△ Less
Submitted 10 May, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
Depth-aware Blending of Smoothed Images for Bokeh Effect Generation
Authors:
Saikat Dutta
Abstract:
Bokeh effect is used in photography to capture images where the closer objects look sharp and every-thing else stays out-of-focus. Bokeh photos are generally captured using Single Lens Reflex cameras using shallow depth-of-field. Most of the modern smartphones can take bokeh images by leveraging dual rear cameras or a good auto-focus hardware. However, for smartphones with single-rear camera witho…
▽ More
Bokeh effect is used in photography to capture images where the closer objects look sharp and every-thing else stays out-of-focus. Bokeh photos are generally captured using Single Lens Reflex cameras using shallow depth-of-field. Most of the modern smartphones can take bokeh images by leveraging dual rear cameras or a good auto-focus hardware. However, for smartphones with single-rear camera without a good auto-focus hardware, we have to rely on software to generate bokeh images. This kind of system is also useful to generate bokeh effect in already captured images. In this paper, an end-to-end deep learning framework is proposed to generate high-quality bokeh effect from images. The original image and different versions of smoothed images are blended to generate Bokeh effect with the help of a monocular depth estimation network. The proposed approach is compared against a saliency detection based baseline and a number of approaches proposed in AIM 2019 Challenge on Bokeh Effect Synthesis. Extensive experiments are shown in order to understand different parts of the proposed algorithm. The network is lightweight and can process an HD image in 0.03 seconds. This approach ranked second in AIM 2019 Bokeh effect challenge-Perceptual Track.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
Quantum mechanics-based signal and image representation: application to denoising
Authors:
Sayantan Dutta,
Adrian Basarab,
Bertrand Georgeot,
Denis Kouamé
Abstract:
Decomposition of digital signals and images into other basis or dictionaries than time or space domains is a very common approach in signal and image processing and analysis. Such a decomposition is commonly obtained using fixed transforms (e.g., Fourier or wavelet) or dictionaries learned from example databases or from the signal or image itself. In this work, we investigate in detail a new appro…
▽ More
Decomposition of digital signals and images into other basis or dictionaries than time or space domains is a very common approach in signal and image processing and analysis. Such a decomposition is commonly obtained using fixed transforms (e.g., Fourier or wavelet) or dictionaries learned from example databases or from the signal or image itself. In this work, we investigate in detail a new approach of constructing such a signal or image-dependent bases inspired by quantum mechanics tools, i.e., by considering the signal or image as a potential in the discretized Schroedinger equation. To illustrate the potential of the proposed decomposition, denoising results are reported in the case of Gaussian, Poisson, and speckle noise and compared to the state of the art algorithms based on wavelet shrinkage, total variation regularization or patch-wise sparse coding in learned dictionaries, non-local means image denoising, and graph signal processing.
△ Less
Submitted 16 March, 2021; v1 submitted 2 April, 2020;
originally announced April 2020.
-
Energy and Latency of Beamforming Architectures for Initial Access in mmWave Wireless Networks
Authors:
C. Nicolas Barati,
Sourjya Dutta,
Sundeep Rangan,
Ashutosh Sabharwal
Abstract:
Future millimeter-wave (mmWave) systems, 5G cellular or WiFi, must rely on highly directional links to overcome severe pathloss in these frequency bands. Establishing such links requires the mutual discovery of the transmitter and the receiver %in the angular domain potentially leading to a large latency and high energy consumption. In this work, we show that both the discovery latency and energy…
▽ More
Future millimeter-wave (mmWave) systems, 5G cellular or WiFi, must rely on highly directional links to overcome severe pathloss in these frequency bands. Establishing such links requires the mutual discovery of the transmitter and the receiver %in the angular domain potentially leading to a large latency and high energy consumption. In this work, we show that both the discovery latency and energy consumption can be significantly reduced by using fully digital front-ends. In fact, we establish that by reducing the resolution of the fully-digital front-ends we can achieve lower energy consumption compared to both analog and high-resolution digital beamformers. Since beamforming through analog front-ends allows sampling in only one direction at a time, the mobile device is ''on'' for a longer time compared to a digital beamformer which can get spatial samples from all directions in one shot. We show that the energy consumed by the analog front-end can be four to six times more than that of the digital front-ends, depending on the size of the employed antenna arrays. We recognize, however, that using fully digital beamforming post beam discovery, i.e., for data transmission, is not viable from a power consumption standpoint. To address this issue, we propose the use of digital beamformers with low-resolution analog to digital converters (4 bits). This reduction in resolution brings the power consumption to the same level as analog beamforming for data transmissions while benefiting from the spatial multiplexing capabilities of fully digital beamforming, thus reducing initial discovery latency and improving energy efficiency.
△ Less
Submitted 7 January, 2020;
originally announced January 2020.
-
Power Efficient Discontinuous Reception in THz and mmWave Wireless Systems
Authors:
Syed Hashim Ali Shah,
Sundar Aditya,
Sourjya Dutta,
Christopher Slezak,
Sundeep Rangan
Abstract:
Discontinuous reception (DRX), where a user equip-ment (UE) temporarily disables its receiver, is a critical power saving feature in modern cellular systems. DRX is likely tobe particularly aggressively used in the mmWave and THzfrequencies due to the high front end power consumption. A keychallenge of DRX in these frequencies is that individual links are directional and highly susceptible to bloc…
▽ More
Discontinuous reception (DRX), where a user equip-ment (UE) temporarily disables its receiver, is a critical power saving feature in modern cellular systems. DRX is likely tobe particularly aggressively used in the mmWave and THzfrequencies due to the high front end power consumption. A keychallenge of DRX in these frequencies is that individual links are directional and highly susceptible to blockage. MmWave and THz UEs will therefore likely need to monitor multiple cells in multiple directions to ensure continuous reliable connectivity.This work proposes a novel, heuristic algorithm to dynamically select the cells to monitor to attempt to optimally trade-off link reliability and power consumption. The paper provides preliminary estimates of connected mode DRX mode consumption using detailed and realistic statistical models of blockers at both 28 and 140 GHz. It is found that although blockage dynamics are faster at 140 GHz, reliable connectivity at low power can be maintained with sufficient macro-diversity and link prediction
△ Less
Submitted 23 August, 2019;
originally announced August 2019.