-
Bayesian Quantum Orthogonal Neural Networks for Anomaly Detection
Authors:
Natansh Mathur,
Brian Coyle,
Nishant Jain,
Snehal Raj,
Akshat Tandon,
Jasper Simon Krauser,
Rainer Stoessel
Abstract:
Identification of defects or anomalies in 3D objects is a crucial task to ensure correct functionality. In this work, we combine Bayesian learning with recent developments in quantum and quantum-inspired machine learning, specifically orthogonal neural networks, to tackle this anomaly detection problem for an industrially relevant use case. Bayesian learning enables uncertainty quantification of p…
▽ More
Identification of defects or anomalies in 3D objects is a crucial task to ensure correct functionality. In this work, we combine Bayesian learning with recent developments in quantum and quantum-inspired machine learning, specifically orthogonal neural networks, to tackle this anomaly detection problem for an industrially relevant use case. Bayesian learning enables uncertainty quantification of predictions, while orthogonality in weight matrices enables smooth training. We develop orthogonal (quantum) versions of 3D convolutional neural networks and show that these models can successfully detect anomalies in 3D objects. To test the feasibility of incorporating quantum computers into a quantum-enhanced anomaly detection pipeline, we perform hardware experiments with our models on IBM's 127-qubit Brisbane device, testing the effect of noise and limited measurement shots.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Authors:
Brian Thompson,
Nitika Mathur,
Daniel Deutsch,
Huda Khayrallah
Abstract:
Selecting an automatic metric that best emulates human annotators is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric scores, and metric rankings depend on the choice of meta-metric. We propose Soft Pairwise Accuracy (SPA), a new meta-metric that builds on Pairwise Accuracy (PA) but incorpor…
▽ More
Selecting an automatic metric that best emulates human annotators is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric scores, and metric rankings depend on the choice of meta-metric. We propose Soft Pairwise Accuracy (SPA), a new meta-metric that builds on Pairwise Accuracy (PA) but incorporates the statistical significance of both the human judgments and the metric scores. We show that SPA is more stable than PA with respect to changes in the number of systems/segments used for evaluation. We also show that PA can only assign a small set of distinct output values to metrics, and this results in many metrics being artificially assigned the exact same PA score. We demonstrate that SPA fixes this issue. Finally, we show that SPA is more discriminative than PA, producing more statistically significant comparisons between metrics. SPA was selected as the official system-level metric for the 2024 WMT Metrics Shared Task.
△ Less
Submitted 4 October, 2024; v1 submitted 14 September, 2024;
originally announced September 2024.
-
MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification
Authors:
Phu Pham,
Aradhya N. Mathur,
Ojaswa Sharma,
Aniket Bera
Abstract:
The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the "Janus" problem-multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splitting have shown its effica…
▽ More
The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the "Janus" problem-multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splitting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored. This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps. Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy. We also introduce a novel densification algorithm that aligns gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models. Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost. Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over most existing methods, which require hours of training time to achieve comparable results.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Curvy: A Parametric Cross-section based Surface Reconstruction
Authors:
Aradhya N. Mathur,
Apoorv Khattar,
Ojaswa Sharma
Abstract:
In this work, we present a novel approach for reconstructing shape point clouds using planar sparse cross-sections with the help of generative modeling. We present unique challenges pertaining to the representation and reconstruction in this problem setting. Most methods in the classical literature lack the ability to generalize based on object class and employ complex mathematical machinery to re…
▽ More
In this work, we present a novel approach for reconstructing shape point clouds using planar sparse cross-sections with the help of generative modeling. We present unique challenges pertaining to the representation and reconstruction in this problem setting. Most methods in the classical literature lack the ability to generalize based on object class and employ complex mathematical machinery to reconstruct reliable surfaces. We present a simple learnable approach to generate a large number of points from a small number of input cross-sections over a large dataset. We use a compact parametric polyline representation using adaptive splitting to represent the cross-sections and perform learning using a Graph Neural Network to reconstruct the underlying shape in an adaptive manner reducing the dependence on the number of cross-sections provided.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
Categorizing Sources of Information for Explanations in Conversational AI Systems for Older Adults Aging in Place
Authors:
Niharika Mathur,
Tamara Zubatiy,
Elizabeth Mynatt
Abstract:
As the permeability of AI systems in interpersonal domains like the home expands, their technical capabilities of generating explanations are required to be aligned with user expectations for transparency and reasoning. This paper presents insights from our ongoing work in understanding the effectiveness of explanations in Conversational AI systems for older adults aging in place and their family…
▽ More
As the permeability of AI systems in interpersonal domains like the home expands, their technical capabilities of generating explanations are required to be aligned with user expectations for transparency and reasoning. This paper presents insights from our ongoing work in understanding the effectiveness of explanations in Conversational AI systems for older adults aging in place and their family caregivers. We argue that in collaborative and multi-user environments like the home, AI systems will make recommendations based on a host of information sources to generate explanations. These sources may be more or less salient based on user mental models of the system and the specific task. We highlight the need for cross technological collaboration between AI systems and other available sources of information in the home to generate multiple explanations for a single user query. Through example scenarios in a caregiving home setting, this paper provides an initial framework for categorizing these sources and informing a potential design space for AI explanations surrounding everyday tasks in the home.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Training-efficient density quantum machine learning
Authors:
Brian Coyle,
Snehal Raj,
Natansh Mathur,
El Amine Cherrat,
Nishant Jain,
Skander Kazdaghli,
Iordanis Kerenidis
Abstract:
Quantum machine learning (QML) requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. We introduce density quantum neural networks, a model family that prepares mixtures of trainable unitaries, with a distributional constraint over coefficients. This framework balances expressivity and efficient trainability, especially on quantum hardware. F…
▽ More
Quantum machine learning (QML) requires powerful, flexible and efficiently trainable models to be successful in solving challenging problems. We introduce density quantum neural networks, a model family that prepares mixtures of trainable unitaries, with a distributional constraint over coefficients. This framework balances expressivity and efficient trainability, especially on quantum hardware. For expressivity, the Hastings-Campbell Mixing lemma converts benefits from linear combination of unitaries into density models with similar performance guarantees but shallower circuits. For trainability, commuting-generator circuits enable density model construction with efficiently extractable gradients. The framework connects to various facets of QML including post-variational and measurement-based learning. In classical settings, density models naturally integrate the mixture of experts formalism, and offer natural overfitting mitigation. The framework is versatile - we uplift several quantum models into density versions to improve model performance, or trainability, or both. These include Hamming weight-preserving and equivariant models, among others. Extensive numerical experiments validate our findings.
△ Less
Submitted 23 May, 2025; v1 submitted 30 May, 2024;
originally announced May 2024.
-
SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout
Authors:
Ayan Banerjee,
Nityanand Mathur,
Josep Lladós,
Umapada Pal,
Anjan Dutta
Abstract:
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vect…
▽ More
Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions. Utilizing a pre-trained LLM for layout generation from text prompts, this framework introduces a technique for producing masked latents in specified bounding boxes for accurate object placement. It introduces a fusion mechanism for integrating attention maps and employs a diffusion U-Net for coherent composition, speeding up the drawing process. The resulting SVG is optimized using a pre-trained encoder and LPIPS loss with opacity modulation to maximize similarity. Additionally, this work explores the potential of primitive shapes in facilitating canvas completion in constrained environments. Through both qualitative and quantitative assessments, SVGCraft is demonstrated to surpass prior works in abstraction, recognizability, and detail, as evidenced by its performance metrics (CLIP-T: 0.4563, Cosine Similarity: 0.6342, Confusion: 0.66, Aesthetic: 6.7832). The code will be available at https://github.com/ayanban011/SVGCraft.
△ Less
Submitted 30 March, 2024;
originally announced April 2024.
-
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models
Authors:
Shyam Marjit,
Harshit Singh,
Nityanand Mathur,
Sayak Paul,
Chia-Mu Yu,
Pin-Yu Chen
Abstract:
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced se…
▽ More
In the realm of subject-driven text-to-image (T2I) generative models, recent developments like DreamBooth and BLIP-Diffusion have led to impressive results yet encounter limitations due to their intensive fine-tuning demands and substantial parameter requirements. While the low-rank adaptation (LoRA) module within DreamBooth offers a reduction in trainable parameters, it introduces a pronounced sensitivity to hyperparameters, leading to a compromise between parameter efficiency and the quality of T2I personalized image synthesis. Addressing these constraints, we introduce \textbf{\textit{DiffuseKronA}}, a novel Kronecker product-based adaptation module that not only significantly reduces the parameter count by 35\% and 99.947\% compared to LoRA-DreamBooth and the original DreamBooth, respectively, but also enhances the quality of image synthesis. Crucially, \textit{DiffuseKronA} mitigates the issue of hyperparameter sensitivity, delivering consistent high-quality generations across a wide range of hyperparameters, thereby diminishing the necessity for extensive fine-tuning. Furthermore, a more controllable decomposition makes \textit{DiffuseKronA} more interpretable and even can achieve up to a 50\% reduction with results comparable to LoRA-Dreambooth. Evaluated against diverse and complex input images and text prompts, \textit{DiffuseKronA} consistently outperforms existing models, producing diverse images of higher quality with improved fidelity and a more accurate color distribution of objects, all the while upholding exceptional parameter efficiency, thus presenting a substantial advancement in the field of T2I generative modeling. Our project page, consisting of links to the code, and pre-trained checkpoints, is available at https://diffusekrona.github.io/.
△ Less
Submitted 28 February, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation
Authors:
Aradhya N. Mathur,
Phu Pham,
Aniket Bera,
Ojaswa Sharma
Abstract:
3D generation has rapidly accelerated in the past decade owing to the progress in the field of generative modeling. Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrate…
▽ More
3D generation has rapidly accelerated in the past decade owing to the progress in the field of generative modeling. Score Distillation Sampling (SDS) based rendering has improved 3D asset generation to a great extent. Further, the recent work of Denoising Diffusion Policy Optimization (DDPO) demonstrates that the diffusion process is compatible with policy gradient methods and has been demonstrated to improve the 2D diffusion models using an aesthetic scoring function. We first show that this aesthetic scorer acts as a strong guide for a variety of SDS-based methods and demonstrates its effectiveness in text-to-3D synthesis. Further, we leverage the DDPO approach to improve the quality of the 3D rendering obtained from 2D diffusion models. Our approach, DDPO3D, employs the policy gradient method in tandem with aesthetic scoring. To the best of our knowledge, this is the first method that extends policy gradient methods to 3D score-based rendering and shows improvement across SDS-based methods such as DreamGaussian, which are currently driving research in text-to-3D synthesis. Our approach is compatible with score distillation-based methods, which would facilitate the integration of diverse reward functions into the generative process. Our project page can be accessed via https://ddpo3d.github.io.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis
Authors:
Nityanand Mathur,
Shyam Marjit,
Abhra Chaudhuri,
Anjan Dutta
Abstract:
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of Bézier curves, which exhibit a waste…
▽ More
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of Bézier curves, which exhibit a wastefully large set of structures that they can evolve into, as most of them are non-essential for generating meaningful sketches. We present CLIPDrawX, an algorithm that provides significantly better visualizations for CLIP text embeddings, using only simple primitive shapes like straight lines and circles. This constrains the set of possible outputs to linear transformations on these primitives, thereby exhibiting an inherently simpler mathematical form. The synthesis process of CLIPDrawX can be tracked end-to-end, with each visual concept being explained exclusively in terms of primitives. Implementation will be released upon acceptance. Project Page: $\href{https://clipdrawx.github.io/}{\text{https://clipdrawx.github.io/}}$.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
Improved Financial Forecasting via Quantum Machine Learning
Authors:
Sohum Thakkar,
Skander Kazdaghli,
Natansh Mathur,
Iordanis Kerenidis,
André J. Ferreira-Martins,
Samurai Brito
Abstract:
Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural netw…
▽ More
Quantum algorithms have the potential to enhance machine learning across a variety of domains and applications. In this work, we show how quantum machine learning can be used to improve financial forecasting. First, we use classical and quantum Determinantal Point Processes to enhance Random Forest models for churn prediction, improving precision by almost 6%. Second, we design quantum neural network architectures with orthogonal and compound layers for credit risk assessment, which match classical performance with significantly fewer parameters. Our results demonstrate that leveraging quantum ideas can effectively enhance the performance of machine learning, both today as quantum-inspired classical ML solutions, and even more in the future, with the advent of better quantum hardware.
△ Less
Submitted 3 April, 2024; v1 submitted 31 May, 2023;
originally announced June 2023.
-
Assessing New Hires' Programming Productivity Through UMETRIX -- An Industry Case Study
Authors:
Sai Anirudh Karre,
Neeraj Mathur,
Y. Raghu Reddy
Abstract:
New hires (novice or experienced) usually undergo an onboarding program for a specific period to get acquainted with the processes of the hiring organization to reach expected programming productivity levels. This paper presents a programming productivity framework developed as an outcome of a three-year-long industry study with small to medium-scale organizations using a usability evaluation and…
▽ More
New hires (novice or experienced) usually undergo an onboarding program for a specific period to get acquainted with the processes of the hiring organization to reach expected programming productivity levels. This paper presents a programming productivity framework developed as an outcome of a three-year-long industry study with small to medium-scale organizations using a usability evaluation and code recommendation tool, UMETRIX, to manage new hire programming productivity. We developed a programming productivity framework around this tool called "Utpada" Participating organizations expressed strong interest in relying on this programming productivity framework to assess the skill gap among new hires. It helped identify under-performers early and strategize their upskill plan per their business needs. The participating organizations have seen an 89% rise in quality code contributions by new hires during their probation period compared to traditional new hires'. This framework is reproducible for any new-hire team size and can be easily integrated into existing programming productivity improvement programs.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
Quantum Vision Transformers
Authors:
El Amine Cherrat,
Iordanis Kerenidis,
Natansh Mathur,
Jonas Landman,
Martin Strahm,
Yun Yvonna Li
Abstract:
In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis. Building upon the previous work, which uses parametrised quantum circuits for data loading and orthogonal neural layers, we introduce three types of quantum transfor…
▽ More
In this work, quantum transformers are designed and analysed in detail by extending the state-of-the-art classical transformer neural network architectures known to be very performant in natural language processing and image analysis. Building upon the previous work, which uses parametrised quantum circuits for data loading and orthogonal neural layers, we introduce three types of quantum transformers for training and inference, including a quantum transformer based on compound matrices, which guarantees a theoretical advantage of the quantum attention mechanism compared to their classical counterpart both in terms of asymptotic run time and the number of model parameters. These quantum architectures can be built using shallow quantum circuits and produce qualitatively different classification models. The three proposed quantum attention layers vary on the spectrum between closely following the classical transformers and exhibiting more quantum characteristics. As building blocks of the quantum transformer, we propose a novel method for loading a matrix as quantum states as well as two new trainable quantum orthogonal layers adaptable to different levels of connectivity and quality of quantum computers. We performed extensive simulations of the quantum transformers on standard medical image datasets that showed competitively, and at times better performance compared to the classical benchmarks, including the best-in-class classical vision transformers. The quantum transformers we trained on these small-scale datasets require fewer parameters compared to standard classical benchmarks. Finally, we implemented our quantum transformers on superconducting quantum computers and obtained encouraging results for up to six qubit experiments.
△ Less
Submitted 20 February, 2024; v1 submitted 16 September, 2022;
originally announced September 2022.
-
LIFI: Towards Linguistically Informed Frame Interpolation
Authors:
Aradhya Neeraj Mathur,
Devansh Batra,
Yaman Kumar,
Rajiv Ratn Shah,
Roger Zimmermann
Abstract:
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to…
▽ More
In this work, we explore a new problem of frame interpolation for speech videos. Such content today forms the major form of online communication. We try to solve this problem by using several deep learning video generation algorithms to generate the missing frames. We also provide examples where computer vision models despite showing high performance on conventional non-linguistic metrics fail to accurately produce faithful interpolation of speech. With this motivation, we provide a new set of linguistically-informed metrics specifically targeted to the problem of speech videos interpolation. We also release several datasets to test computer vision video generation models of their speech understanding.
△ Less
Submitted 2 December, 2020; v1 submitted 30 October, 2020;
originally announced October 2020.
-
Tangled up in BLEU: Reevaluating the Evaluation of Automatic Machine Translation Evaluation Metrics
Authors:
Nitika Mathur,
Timothy Baldwin,
Trevor Cohn
Abstract:
Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which ofte…
▽ More
Automatic metrics are fundamental for the development and evaluation of machine translation systems. Judging whether, and to what extent, automatic metrics concur with the gold standard of human evaluation is not a straightforward problem. We show that current methods for judging metrics are highly sensitive to the translations used for assessment, particularly the presence of outliers, which often leads to falsely confident conclusions about a metric's efficacy. Finally, we turn to pairwise system ranking, developing a method for thresholding performance improvement under an automatic metric against human judgements, which allows quantification of type I versus type II errors incurred, i.e., insignificant human differences in system quality that are accepted, and significant human differences that are rejected. Together, these findings suggest improvements to the protocols for metric evaluation and system performance evaluation in machine translation.
△ Less
Submitted 12 June, 2020; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Multimodal Medical Volume Colorization from 2D Style
Authors:
Aradhya Neeraj Mathur,
Apoorv Khattar,
Ojaswa Sharma
Abstract:
Colorization involves the synthesis of colors on a target image while preserving structural content as well as the semantics of the target image. This is a well-explored problem in 2D with many state-of-the-art solutions. We propose a novel deep learning-based approach for the colorization of 3D medical volumes. Our system is capable of directly mapping the colors of a 2D photograph to a 3D MRI vo…
▽ More
Colorization involves the synthesis of colors on a target image while preserving structural content as well as the semantics of the target image. This is a well-explored problem in 2D with many state-of-the-art solutions. We propose a novel deep learning-based approach for the colorization of 3D medical volumes. Our system is capable of directly mapping the colors of a 2D photograph to a 3D MRI volume in real-time, producing a high-fidelity color volume suitable for photo-realistic visualization. Since this work is first of its kind, we discuss the full pipeline in detail and the challenges that it brings for 3D medical data. The colorization of medical MRI volume also entails modality conversion that highlights the robustness of our approach in handling multi-modal data.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Load Balancing Optimization in LTE/LTE-A Cellular Networks: A Review
Authors:
Sumita Mishra,
Nidhi Mathur
Abstract:
During the past few decades wireless technology has seen a tremendous growth. The recent introduction of high-end mobile devices has further increased subscriber's demand for high bandwidth. Current cellular systems require manual configuration and management of networks, which is now costly, time consuming and error prone due to exponentially increasing rate of mobile users and nodes. This leads…
▽ More
During the past few decades wireless technology has seen a tremendous growth. The recent introduction of high-end mobile devices has further increased subscriber's demand for high bandwidth. Current cellular systems require manual configuration and management of networks, which is now costly, time consuming and error prone due to exponentially increasing rate of mobile users and nodes. This leads to introduction of self organizing capabilities for network management with minimum human involvement. It is expected to permit higher end user Quality of Service (QoS) along with less operational and maintenance cost for telecom service providers. Self organized cellular networks incorporate a collection of functions for automatic configuration, optimization and maintenance of cellular networks. As mobile end users continue to use network resources while moving from a cell boundary to other, traffic load within a cell does not remain constant. Thus Load balancing as a part of self organized network solution, has become one of the most active and emerging fields of research in Cellular Network. It involves transfer of load from overloaded cells to the neighbouring cells with free resources for more balanced load distribution in order to maintain appropriate end-user experience and network performance. In this paper, review of various load balancing techniques currently used in mobile networks is presented, with special emphasis on techniques that are suitable for self optimization feature in future cellular networks.
△ Less
Submitted 23 December, 2014;
originally announced December 2014.