Search | arXiv e-print repository

Town Hall Debate Prompting: Enhancing Logical Reasoning in LLMs through Multi-Persona Interaction

Authors: Vivaan Sandwar, Bhav Jain, Rishan Thangaraj, Ishaan Garg, Michael Lam, Kevin Zhu

Abstract: Debate is a commonly used form of human communication catered towards problem-solving because of its efficiency. Debate fundamentally allows multiple viewpoints to be brought up in problem-solving, and for complex problems, each viewpoint opens a new path for problem-solving. In this work, we apply this concept to LLM decision-making by proposing town hall-style debate prompting (THDP), a promptin… ▽ More Debate is a commonly used form of human communication catered towards problem-solving because of its efficiency. Debate fundamentally allows multiple viewpoints to be brought up in problem-solving, and for complex problems, each viewpoint opens a new path for problem-solving. In this work, we apply this concept to LLM decision-making by proposing town hall-style debate prompting (THDP), a prompting method that splices a language model into multiple personas that will debate one another to reach a conclusion. Our experimental pipeline varies both the number of personas and the personality types of each persona to find the optimum town hall size and personality for benchmark performance as measured by ZebraLogic bench, a reasoning-intensive benchmark characterized by both multiple-choice and fill-in-the-blank questions. Our experimental results demonstrate that a town hall size of 5 personas with LLM-determined personality types performs optimally on ZebraLogic, achieving a 13\% improvement over one-shot CoT baselines in per-cell accuracy in GPT-4o, 9% puzzle accuracy increase in Claude 3.5 Sonnet, and an improvement in hard puzzle accuracy from 10-15%. △ Less

Submitted 28 January, 2025; originally announced February 2025.

Comments: Accepted to SoCal NLP Symposium 2024

arXiv:2407.04797 [pdf, other]

Revealing the Utilized Rank of Subspaces of Learning in Neural Networks

Authors: Isha Garg, Christian Koguchi, Eshan Verma, Daniel Ulbricht

Abstract: In this work, we study how well the learned weights of a neural network utilize the space available to them. This notion is related to capacity, but additionally incorporates the interaction of the network architecture with the dataset. Most learned weights appear to be full rank, and are therefore not amenable to low rank decomposition. This deceptively implies that the weights are utilizing the… ▽ More In this work, we study how well the learned weights of a neural network utilize the space available to them. This notion is related to capacity, but additionally incorporates the interaction of the network architecture with the dataset. Most learned weights appear to be full rank, and are therefore not amenable to low rank decomposition. This deceptively implies that the weights are utilizing the entire space available to them. We propose a simple data-driven transformation that projects the weights onto the subspace where the data and the weight interact. This preserves the functional mapping of the layer and reveals its low rank structure. In our findings, we conclude that most models utilize a fraction of the available space. For instance, for ViTB-16 and ViTL-16 trained on ImageNet, the mean layer utilization is 35% and 20% respectively. Our transformation results in reducing the parameters to 50% and 25% respectively, while resulting in less than 0.2% accuracy drop after fine-tuning. We also show that self-supervised pre-training drives this utilization up to 70%, justifying its suitability for downstream tasks. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: Presented at Efficient Systems for Foundation Models Workshop at the International Conference on Machine Learning (ICML) 2024

arXiv:2403.13082 [pdf, other]

Pruning for Improved ADC Efficiency in Crossbar-based Analog In-memory Accelerators

Authors: Timur Ibrayev, Isha Garg, Indranil Chakraborty, Kaushik Roy

Abstract: Deep learning has proved successful in many applications but suffers from high computational demands and requires custom accelerators for deployment. Crossbar-based analog in-memory architectures are attractive for acceleration of deep neural networks (DNN), due to their high data reuse and high efficiency enabled by combining storage and computation in memory. However, they require analog-to-digi… ▽ More Deep learning has proved successful in many applications but suffers from high computational demands and requires custom accelerators for deployment. Crossbar-based analog in-memory architectures are attractive for acceleration of deep neural networks (DNN), due to their high data reuse and high efficiency enabled by combining storage and computation in memory. However, they require analog-to-digital converters (ADCs) to communicate crossbar outputs. ADCs consume a significant portion of energy and area of every crossbar processing unit, thus diminishing the potential efficiency benefits. Pruning is a well-studied technique to improve the efficiency of DNNs but requires modifications to be effective for crossbars. In this paper, we motivate crossbar-attuned pruning to target ADC-specific inefficiencies. This is achieved by identifying three key properties (dubbed D.U.B.) that induce sparsity that can be utilized to reduce ADC energy without sacrificing accuracy. The first property ensures that sparsity translates effectively to hardware efficiency by restricting sparsity levels to Discrete powers of 2. The other 2 properties encourage columns in the same crossbar to achieve both Unstructured and Balanced sparsity in order to amortize the accuracy drop. The desired D.U.B. sparsity is then achieved by regularizing the variance of $L_{0}$ norms of neighboring columns within the same crossbar. Our proposed implementation allows it to be directly used in end-to-end gradient-based training. We apply the proposed algorithm to convolutional layers of VGG11 and ResNet18 models, trained on CIFAR-10 and ImageNet datasets, and achieve up to 7.13x and 1.27x improvement, respectively, in ADC energy with less than 1% drop in accuracy. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: 11 pages, 5 figures

arXiv:2307.05831 [pdf, other]

Memorization Through the Lens of Curvature of Loss Function Around Samples

Authors: Isha Garg, Deepak Ravikumar, Kaushik Roy

Abstract: Deep neural networks are over-parameterized and easily overfit the datasets they train on. In the extreme case, it has been shown that these networks can memorize a training set with fully randomized labels. We propose using the curvature of loss function around each training sample, averaged over training epochs, as a measure of memorization of the sample. We use this metric to study the generali… ▽ More Deep neural networks are over-parameterized and easily overfit the datasets they train on. In the extreme case, it has been shown that these networks can memorize a training set with fully randomized labels. We propose using the curvature of loss function around each training sample, averaged over training epochs, as a measure of memorization of the sample. We use this metric to study the generalization versus memorization properties of different samples in popular image datasets and show that it captures memorization statistics well, both qualitatively and quantitatively. We first show that the high curvature samples visually correspond to long-tailed, mislabeled, or conflicting samples, those that are most likely to be memorized. This analysis helps us find, to the best of our knowledge, a novel failure mode on the CIFAR100 and ImageNet datasets: that of duplicated images with differing labels. Quantitatively, we corroborate the validity of our scores via two methods. First, we validate our scores against an independent and comprehensively calculated baseline, by showing high cosine similarity with the memorization scores released by Feldman and Zhang (2020). Second, we inject corrupted samples which are memorized by the network, and show that these are learned with high curvature. To this end, we synthetically mislabel a random subset of the dataset. We overfit a network to it and show that sorting by curvature yields high AUROC values for identifying the corrupted samples. An added advantage of our method is that it is scalable, as it requires training only a single network as opposed to the thousands trained by the baseline, while capturing the aforementioned failure mode that the baseline fails to identify. △ Less

Submitted 1 October, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

Comments: Preprint

arXiv:2201.08494 [pdf, other]

doi 10.1109/ACCESS.2024.3390716

TOFU: Towards Obfuscated Federated Updates by Encoding Weight Updates into Gradients from Proxy Data

Authors: Isha Garg, Manish Nagaraj, Kaushik Roy

Abstract: Advances in Federated Learning and an abundance of user data have enabled rich collaborative learning between multiple clients, without sharing user data. This is done via a central server that aggregates learning in the form of weight updates. However, this comes at the cost of repeated expensive communication between the clients and the server, and concerns about compromised user privacy. The in… ▽ More Advances in Federated Learning and an abundance of user data have enabled rich collaborative learning between multiple clients, without sharing user data. This is done via a central server that aggregates learning in the form of weight updates. However, this comes at the cost of repeated expensive communication between the clients and the server, and concerns about compromised user privacy. The inversion of gradients into the data that generated them is termed data leakage. Encryption techniques can be used to counter this leakage, but at added expense. To address these challenges of communication efficiency and privacy, we propose TOFU, a novel algorithm which generates proxy data that encodes the weight updates for each client in its gradients. Instead of weight updates, this proxy data is now shared. Since input data is far lower in dimensional complexity than weights, this encoding allows us to send much lesser data per communication round. Additionally, the proxy data resembles noise, and even perfect reconstruction from data leakage attacks would invert the decoded gradients into unrecognizable noise, enhancing privacy. We show that TOFU enables learning with less than 1% and 7% accuracy drops on MNIST and on CIFAR-10 datasets, respectively. This drop can be recovered via a few rounds of expensive encrypted gradient exchange. This enables us to learn to near-full accuracy in a federated setup, while being 4x and 6.6x more communication efficient than the standard Federated Averaging algorithm on MNIST and CIFAR-10, respectively. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Comments: First two authors contributed equally to the paper

arXiv:2112.10844 [pdf, other]

Encoding Hierarchical Information in Neural Networks helps in Subpopulation Shift

Authors: Amitangshu Mukherjee, Isha Garg, Kaushik Roy

Abstract: Over the past decade, deep neural networks have proven to be adept in image classification tasks, often surpassing humans in terms of accuracy. However, standard neural networks often fail to understand the concept of hierarchical structures and dependencies among different classes for vision related tasks. Humans on the other hand, seem to intuitively learn categories conceptually, progressively… ▽ More Over the past decade, deep neural networks have proven to be adept in image classification tasks, often surpassing humans in terms of accuracy. However, standard neural networks often fail to understand the concept of hierarchical structures and dependencies among different classes for vision related tasks. Humans on the other hand, seem to intuitively learn categories conceptually, progressively growing from understanding high-level concepts down to granular levels of categories. One of the issues arising from the inability of neural networks to encode such dependencies within its learned structure is that of subpopulation shift -- where models are queried with novel unseen classes taken from a shifted population of the training set categories. Since the neural network treats each class as independent from all others, it struggles to categorize shifting populations that are dependent at higher levels of the hierarchy. In this work, we study the aforementioned problems through the lens of a novel conditional supervised training framework. We tackle subpopulation shift by a structured learning procedure that incorporates hierarchical information conditionally through labels. Furthermore, we introduce a notion of graphical distance to model the catastrophic effect of mispredictions. We show that learning in this structured hierarchical manner results in networks that are more robust against subpopulation shifts, with an improvement up to 3\% in terms of accuracy and up to 11\% in terms of graphical distance over standard models on subpopulation shift benchmarks. △ Less

Submitted 13 June, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: 15 pages, 7 figures

arXiv:2104.12528 [pdf, other]

Spatio-Temporal Pruning and Quantization for Low-latency Spiking Neural Networks

Authors: Sayeed Shafayet Chowdhury, Isha Garg, Kaushik Roy

Abstract: Spiking Neural Networks (SNNs) are a promising alternative to traditional deep learning methods since they perform event-driven information processing. However, a major drawback of SNNs is high inference latency. The efficiency of SNNs could be enhanced using compression methods such as pruning and quantization. Notably, SNNs, unlike their non-spiking counterparts, consist of a temporal dimension,… ▽ More Spiking Neural Networks (SNNs) are a promising alternative to traditional deep learning methods since they perform event-driven information processing. However, a major drawback of SNNs is high inference latency. The efficiency of SNNs could be enhanced using compression methods such as pruning and quantization. Notably, SNNs, unlike their non-spiking counterparts, consist of a temporal dimension, the compression of which can lead to latency reduction. In this paper, we propose spatial and temporal pruning of SNNs. First, structured spatial pruning is performed by determining the layer-wise significant dimensions using principal component analysis of the average accumulated membrane potential of the neurons. This step leads to 10-14X model compression. Additionally, it enables inference with lower latency and decreases the spike count per inference. To further reduce latency, temporal pruning is performed by gradually reducing the timesteps while training. The networks are trained using surrogate gradient descent based backpropagation and we validate the results on CIFAR10 and CIFAR100, using VGG architectures. The spatiotemporally pruned SNNs achieve 89.04% and 66.4% accuracy on CIFAR10 and CIFAR100, respectively, while performing inference with 3-30X reduced latency compared to state-of-the-art SNNs. Moreover, they require 8-14X lesser compute energy compared to their unpruned standard deep learning counterparts. The energy numbers are obtained by multiplying the number of operations with energy per operation. These SNNs also provide 1-4% higher robustness against Gaussian noise corrupted inputs. Furthermore, we perform weight quantization and find that performance remains reasonably stable up to 5-bit quantization. △ Less

Submitted 28 April, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

arXiv:2103.09762 [pdf, other]

Gradient Projection Memory for Continual Learning

Authors: Gobinda Saha, Isha Garg, Kaushik Roy

Abstract: The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient s… ▽ More The ability to learn continually without forgetting the past tasks is a desired attribute for artificial learning systems. Existing approaches to enable such learning in artificial neural networks usually rely on network growth, importance based weight update or replay of old data from the memory. In contrast, we propose a novel approach where a neural network learns new tasks by taking gradient steps in the orthogonal direction to the gradient subspaces deemed important for the past tasks. We find the bases of these subspaces by analyzing network representations (activations) after learning each task with Singular Value Decomposition (SVD) in a single shot manner and store them in the memory as Gradient Projection Memory (GPM). With qualitative and quantitative analyses, we show that such orthogonal gradient descent induces minimum to no interference with the past tasks, thereby mitigates forgetting. We evaluate our algorithm on diverse image classification datasets with short and long sequences of tasks and report better or on-par performance compared to the state-of-the-art approaches. △ Less

Submitted 17 March, 2021; originally announced March 2021.

Comments: Accepted for Oral Presentation at ICLR 2021 https://openreview.net/forum?id=3AOj0RCNC2

Journal ref: International Conference on Learning Representations (ICLR), 2021

arXiv:2012.08398 [pdf, other]

Exploring Vicinal Risk Minimization for Lightweight Out-of-Distribution Detection

Authors: Deepak Ravikumar, Sangamesh Kodge, Isha Garg, Kaushik Roy

Abstract: Deep neural networks have found widespread adoption in solving complex tasks ranging from image recognition to natural language processing. However, these networks make confident mispredictions when presented with data that does not belong to the training distribution, i.e. out-of-distribution (OoD) samples. In this paper we explore whether the property of Vicinal Risk Minimization (VRM) to smooth… ▽ More Deep neural networks have found widespread adoption in solving complex tasks ranging from image recognition to natural language processing. However, these networks make confident mispredictions when presented with data that does not belong to the training distribution, i.e. out-of-distribution (OoD) samples. In this paper we explore whether the property of Vicinal Risk Minimization (VRM) to smoothly interpolate between different class boundaries helps to train better OoD detectors. We apply VRM to existing OoD detection techniques and show their improved performance. We observe that existing OoD detectors have significant memory and compute overhead, hence we leverage VRM to develop an OoD detector with minimal overheard. Our detection method introduces an auxiliary class for classifying OoD samples. We utilize mixup in two ways to implement Vicinal Risk Minimization. First, we perform mixup within the same class and second, we perform mixup with Gaussian noise when training the auxiliary class. Our method achieves near competitive performance with significantly less compute and memory overhead when compared to existing OoD detection techniques. This facilitates the deployment of OoD detection on edge devices and expands our understanding of Vicinal Risk Minimization for use in training OoD detectors. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2010.01795 [pdf, other]

DCT-SNN: Using DCT to Distribute Spatial Information over Time for Learning Low-Latency Spiking Neural Networks

Authors: Isha Garg, Sayeed Shafayet Chowdhury, Kaushik Roy

Abstract: Spiking Neural Networks (SNNs) offer a promising alternative to traditional deep learning frameworks, since they provide higher computational efficiency due to event-driven information processing. SNNs distribute the analog values of pixel intensities into binary spikes over time. However, the most widely used input coding schemes, such as Poisson based rate-coding, do not leverage the additional… ▽ More Spiking Neural Networks (SNNs) offer a promising alternative to traditional deep learning frameworks, since they provide higher computational efficiency due to event-driven information processing. SNNs distribute the analog values of pixel intensities into binary spikes over time. However, the most widely used input coding schemes, such as Poisson based rate-coding, do not leverage the additional temporal learning capability of SNNs effectively. Moreover, these SNNs suffer from high inference latency which is a major bottleneck to their deployment. To overcome this, we propose a scalable time-based encoding scheme that utilizes the Discrete Cosine Transform (DCT) to reduce the number of timesteps required for inference. DCT decomposes an image into a weighted sum of sinusoidal basis images. At each time step, the Hadamard product of the DCT coefficients and a single frequency base, taken in order, is given to an accumulator that generates spikes upon crossing a threshold. We use the proposed scheme to learn DCT-SNN, a low-latency deep SNN with leaky-integrate-and-fire neurons, trained using surrogate gradient descent based backpropagation. We achieve top-1 accuracy of 89.94%, 68.3% and 52.43% on CIFAR-10, CIFAR-100 and TinyImageNet, respectively using VGG architectures. Notably, DCT-SNN performs inference with 2-14X reduced latency compared to other state-of-the-art SNNs, while achieving comparable accuracy to their standard deep learning counterparts. The dimension of the transform allows us to control the number of timesteps required for inference. Additionally, we can trade-off accuracy with latency in a principled manner by dropping the highest frequency components during inference. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: The first two authors contributed equally to this paper

arXiv:2008.01524 [pdf, other]

TREND: Transferability based Robust ENsemble Design

Authors: Deepak Ravikumar, Sangamesh Kodge, Isha Garg, Kaushik Roy

Abstract: Deep Learning models hold state-of-the-art performance in many fields, but their vulnerability to adversarial examples poses threat to their ubiquitous deployment in practical settings. Additionally, adversarial inputs generated on one classifier have been shown to transfer to other classifiers trained on similar data, which makes the attacks possible even if model parameters are not revealed to t… ▽ More Deep Learning models hold state-of-the-art performance in many fields, but their vulnerability to adversarial examples poses threat to their ubiquitous deployment in practical settings. Additionally, adversarial inputs generated on one classifier have been shown to transfer to other classifiers trained on similar data, which makes the attacks possible even if model parameters are not revealed to the adversary. This property of transferability has not yet been systematically studied, leading to a gap in our understanding of robustness of neural networks to adversarial inputs. In this work, we study the effect of network architecture, initialization, optimizer, input, weight and activation quantization on transferability of adversarial samples. We also study the effect of different attacks on transferability. Our experiments reveal that transferability is significantly hampered by input quantization and architectural mismatch between source and target, is unaffected by initialization but the choice of optimizer turns out to be critical. We observe that transferability is architecture-dependent for both weight and activation quantized models. To quantify transferability, we use simple metric and demonstrate the utility of the metric in designing a methodology to build ensembles with improved adversarial robustness. When attacking ensembles we observe that "gradient domination" by a single ensemble member model hampers existing attacks. To combat this we propose a new state-of-the-art ensemble attack. We compare the proposed attack with existing attack techniques to show its effectiveness. Finally, we show that an ensemble consisting of carefully chosen diverse networks achieves better adversarial robustness than would otherwise be possible with a single network. △ Less

Submitted 30 March, 2021; v1 submitted 4 August, 2020; originally announced August 2020.

arXiv:2001.08650 [pdf, other]

SPACE: Structured Compression and Sharing of Representational Space for Continual Learning

Authors: Gobinda Saha, Isha Garg, Aayush Ankit, Kaushik Roy

Abstract: Humans learn adaptively and efficiently throughout their lives. However, incrementally learning tasks causes artificial neural networks to overwrite relevant information learned about older tasks, resulting in 'Catastrophic Forgetting'. Efforts to overcome this phenomenon often utilize resources poorly, for instance, by growing the network architecture or needing to save parametric importance scor… ▽ More Humans learn adaptively and efficiently throughout their lives. However, incrementally learning tasks causes artificial neural networks to overwrite relevant information learned about older tasks, resulting in 'Catastrophic Forgetting'. Efforts to overcome this phenomenon often utilize resources poorly, for instance, by growing the network architecture or needing to save parametric importance scores, or violate data privacy between tasks. To tackle this, we propose SPACE, an algorithm that enables a network to learn continually and efficiently by partitioning the learnt space into a Core space, that serves as the condensed knowledge base over previously learned tasks, and a Residual space, which is akin to a scratch space for learning the current task. After learning each task, the Residual is analyzed for redundancy, both within itself and with the learnt Core space. A minimal number of extra dimensions required to explain the current task are added to the Core space and the remaining Residual is freed up for learning the next task. We evaluate our algorithm on P-MNIST, CIFAR and a sequence of 8 different datasets, and achieve comparable accuracy to the state-of-the-art methods while overcoming catastrophic forgetting. Additionally, our algorithm is well suited for practical use. The partitioning algorithm analyzes all layers in one shot, ensuring scalability to deeper networks. Moreover, the analysis of dimensions translates to filter-level sparsity, and the structured nature of the resulting architecture gives us up to 5x improvement in energy efficiency during task inference over the current state-of-the-art. △ Less

Submitted 3 February, 2021; v1 submitted 23 January, 2020; originally announced January 2020.

Comments: The first two authors contributed equally to this paper

arXiv:1906.01493 [pdf, other]

doi 10.1038/s42256-019-0134-0

Constructing Energy-efficient Mixed-precision Neural Networks through Principal Component Analysis for Edge Intelligence

Authors: Indranil Chakraborty, Deboleena Roy, Isha Garg, Aayush Ankit, Kaushik Roy

Abstract: The `Internet of Things' has brought increased demand for AI-based edge computing in applications ranging from healthcare monitoring systems to autonomous vehicles. Quantization is a powerful tool to address the growing computational cost of such applications, and yields significant compression over full-precision networks. However, quantization can result in substantial loss of performance for co… ▽ More The `Internet of Things' has brought increased demand for AI-based edge computing in applications ranging from healthcare monitoring systems to autonomous vehicles. Quantization is a powerful tool to address the growing computational cost of such applications, and yields significant compression over full-precision networks. However, quantization can result in substantial loss of performance for complex image classification tasks. To address this, we propose a Principal Component Analysis (PCA) driven methodology to identify the important layers of a binary network, and design mixed-precision networks. The proposed Hybrid-Net achieves a more than 10% improvement in classification accuracy over binary networks such as XNOR-Net for ResNet and VGG architectures on CIFAR-100 and ImageNet datasets while still achieving up to 94% of the energy-efficiency of XNOR-Nets. This work furthers the feasibility of using highly compressed neural networks for energy-efficient neural computing in edge devices. △ Less

Submitted 2 December, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: 14 pages, 4 figures, 8 tables

Journal ref: Nature Machine Intelligence, 2, 43-55 (2020)

arXiv:1812.06224 [pdf, other]

doi 10.1109/ACCESS.2019.2961960

A Low Effort Approach to Structured CNN Design Using PCA

Authors: Isha Garg, Priyadarshini Panda, Kaushik Roy

Abstract: Deep learning models hold state of the art performance in many fields, yet their design is still based on heuristics or grid search methods that often result in overparametrized networks. This work proposes a method to analyze a trained network and deduce an optimized, compressed architecture that preserves accuracy while keeping computational costs tractable. Model compression is an active field… ▽ More Deep learning models hold state of the art performance in many fields, yet their design is still based on heuristics or grid search methods that often result in overparametrized networks. This work proposes a method to analyze a trained network and deduce an optimized, compressed architecture that preserves accuracy while keeping computational costs tractable. Model compression is an active field of research that targets the problem of realizing deep learning models in hardware. However, most pruning methodologies tend to be experimental, requiring large compute and time intensive iterations of retraining the entire network. We introduce structure into model design by proposing a single shot analysis of a trained network that serves as a first order, low effort approach to dimensionality reduction, by using PCA (Principal Component Analysis). The proposed method simultaneously analyzes the activations of each layer and considers the dimensionality of the space described by the filters generating these activations. It optimizes the architecture in terms of number of layers, and number of filters per layer without any iterative retraining procedures, making it a viable, low effort technique to design efficient networks. We demonstrate the proposed methodology on AlexNet and VGG style networks on the CIFAR-10, CIFAR-100 and ImageNet datasets, and successfully achieve an optimized architecture with a reduction of up to 3.8X and 9X in the number of operations and parameters respectively, while trading off less than 1% accuracy. We also apply the method to MobileNet, and achieve 1.7X and 3.9X reduction in the number of operations and parameters respectively, while improving accuracy by almost one percentage point. △ Less

Submitted 10 January, 2020; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: To be Published in IEEE Access, Volume 8, 2020

arXiv:1802.03915 [pdf, other]

doi 10.1103/PhysRevD.98.063523

Topological pseudodefects of a supersymmetric $SO(10)$ model and cosmology

Authors: Ila Garg, Urjit A. Yajnik

Abstract: Obtaining realistic supersymmetry preserving vacua in the minimal renormalizable supersymmetric $Spin(10)$ GUT model introduces considerations of the non-trivial topology of the vacuum manifold. The $D$-parity of low energy unification schemes gets lifted to a one-parameter subgroup $U(1)_D$ of $Spin(10)$. Yet, the choice of the fields signaling spontaneous symmetry breaking leads to disconnected… ▽ More Obtaining realistic supersymmetry preserving vacua in the minimal renormalizable supersymmetric $Spin(10)$ GUT model introduces considerations of the non-trivial topology of the vacuum manifold. The $D$-parity of low energy unification schemes gets lifted to a one-parameter subgroup $U(1)_D$ of $Spin(10)$. Yet, the choice of the fields signaling spontaneous symmetry breaking leads to disconnected subsets in the vacuum manifold related by the $D$-parity. The resulting domain walls, existing due to topological reasons but not stable, are identified as topological pseudodefects. We obtain a class of one-parameter paths connecting $D$-parity flipped vacua and compute the energy barrier height along the same. We consider the various patterns of symmetry breaking which can result in either intermediate scale gauge groups or a supersymmetric extension of the Standard Model. If the onset of inflation is subsequent to GUT breaking, as could happen also if inflation is naturally explained by the same GUT, the existence of such pseudodefects can leave signatures in the CMB. Specifically, this could have an impact on the scale invariance of the CMB fluctuations and LSS data at the largest scale. △ Less

Submitted 29 September, 2018; v1 submitted 12 February, 2018; originally announced February 2018.

Comments: 8 Pages, 2 figures, matches with the published version in PRD

Journal ref: Phys. Rev. D 98, 063523 (2018)

arXiv:1711.01979 [pdf, other]

doi 10.1142/S0217751X18501270

No-scale SUGRA Inflation and Type-I seesaw

Authors: Ila Garg, Subhendra Mohanty

Abstract: We show that MSSM with three right handed neutrinos incorporating a renormalizable Type-I seesaw superpotential and no-scale SURGA Kähler potential can lead to a Starobinsky kind of inflation potential along a flat direction associated with gauge invariant combination of Higgs, slepton and right handed sneutrino superfields. The inflation conditions put constraints on the Dirac Yukawa coupling and… ▽ More We show that MSSM with three right handed neutrinos incorporating a renormalizable Type-I seesaw superpotential and no-scale SURGA Kähler potential can lead to a Starobinsky kind of inflation potential along a flat direction associated with gauge invariant combination of Higgs, slepton and right handed sneutrino superfields. The inflation conditions put constraints on the Dirac Yukawa coupling and the Majorana masses required for the neutrino masses and also demands the tuning among the parameters. The scale of inflation is set by the mass of the heaviest right handed neutrino. We also fit the neutrino data from oscillation experiments at low scale using the effective RGEs of MSSM with three right handed neutrinos. △ Less

Submitted 27 July, 2018; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: 16 Pages, Version accepted for publication in IJMPA, Journal-ref: IJMPA 33, 1850127 (2018)

arXiv:1706.08851 [pdf, other]

doi 10.1103/PhysRevD.96.055020

Electroweak vacuum stability in presence of singlet scalar dark matter in TeV scale seesaw models

Authors: Ila Garg, Srubabati Goswami, Vishnudath K. N., Najimuddin Khan

Abstract: We consider singlet extensions of the standard model, both in the fermion and the scalar sector, to account for the generation of neutrino mass at the TeV scale and the existence of dark matter respectively. For the neutrino sector we consider models with extra singlet fermions which can generate neutrino mass via the so called inverse or linear seesaw mechanism whereas a singlet scalar is introdu… ▽ More We consider singlet extensions of the standard model, both in the fermion and the scalar sector, to account for the generation of neutrino mass at the TeV scale and the existence of dark matter respectively. For the neutrino sector we consider models with extra singlet fermions which can generate neutrino mass via the so called inverse or linear seesaw mechanism whereas a singlet scalar is introduced as the candidate for dark matter. We show that although these two sectors are disconnected at low energy, the coupling constants of both the sectors get correlated at high energy scale by the constraints coming from the perturbativity and stability/metastability of the electroweak vacuum. The singlet fermions try to destabilize the electroweak vacuum while the singlet scalar aids the stability. As an upshot, the electroweak vacuum may attain absolute stability even upto the Planck scale for suitable values of the parameters. We delineate the parameter space for the singlet fermion and the scalar couplings for which the electroweak vacuum remains stable/metastable and at the same time giving the correct relic density and neutrino masses and mixing angles as observed. △ Less

Submitted 27 June, 2017; originally announced June 2017.

Comments: 35 pages, 15 figures

Journal ref: Phys. Rev. D 96, 055020 (2017)

arXiv:1509.00422 [pdf, other]

doi 10.1103/PhysRevD.98.075006

NMSGUT emergence and Trans-Unification RG flows

Authors: Charanjit S. Aulakh, Ila Garg, Charanjit K. Khosa

Abstract: Consistency of trans-unification RG evolution is used to discuss the domain of definition of the New Minimal Supersymmetric SO(10) GUT (NMSGUT). We compute the 1-loop RGE $β$ functions, simplifying generic formulae using constraints of gauge invariance and superpotential structure. We also calculate the 2 loop contributions to the gauge coupling and gaugino mass and indicate how to get full 2 loop… ▽ More Consistency of trans-unification RG evolution is used to discuss the domain of definition of the New Minimal Supersymmetric SO(10) GUT (NMSGUT). We compute the 1-loop RGE $β$ functions, simplifying generic formulae using constraints of gauge invariance and superpotential structure. We also calculate the 2 loop contributions to the gauge coupling and gaugino mass and indicate how to get full 2 loop results for all couplings. Our method overcomes combinatorial barriers that frustrate computer algebra based attempts to calculate SO(10) $β$ functions involving large irreps. Use of the RGEs identifies a perturbative domain $Q < M_E$, where $M_E <M_{Planck}$ is the \emph{scale of emergence} where the NMSGUT, with GUT compatible soft supersymmetry breaking terms emerges from the strong UV dynamics associated with the Landau poles in gauge and Yukawa couplings. Due to the strength of the RG flows the Landau poles for gauge and Yukawa couplings lie near a cutoff scale $Λ_E $ for the perturbative dynamics of the NMSGUT which just above $M_E$. SO(10) RG flows into the IR are shown to facilitate small gaugino masses and generation of negative Non Universal Higgs masses squared needed by realistic NMSGUT fits of low energy data. Running the simple canonical theory emergent at $M_E$ through $M_X$ down to the electroweak scale enables tests of candidate scenarios such as supergravity based NMSGUT with canonical kinetic terms and NMSGUT based dynamical Yukawa unification. △ Less

Submitted 22 October, 2018; v1 submitted 1 September, 2015; originally announced September 2015.

Comments: 36 pages, 1 Figure, 4 Tables, 77 equations, 42 references, RevTeX4 PDFLateX. Version published in Phys. Rev. D

Journal ref: Phys. Rev. D 98, 075006 (2018)

arXiv:1506.05204 [pdf, other]

New minimal supersymmetric SO(10) GUT phenomenology and its cosmological implications

Authors: Ila Garg

Abstract: Supersymmetric GUTs based on SO(10) gauge group are leading contenders to describe particle physics beyond the Standard Model. Among these the "New minimal supersymmetric SO(10) grand unified theory" (NMSGUT) based on Higgs system 10+120+210+126+$\overline{126}$ has been developing since 1982. It now successfully fits the whole standard Model gauge coupling, symmetry breaking and fermion mass-mixi… ▽ More Supersymmetric GUTs based on SO(10) gauge group are leading contenders to describe particle physics beyond the Standard Model. Among these the "New minimal supersymmetric SO(10) grand unified theory" (NMSGUT) based on Higgs system 10+120+210+126+$\overline{126}$ has been developing since 1982. It now successfully fits the whole standard Model gauge coupling, symmetry breaking and fermion mass-mixing data as well as the neutrino mass and mixing data in terms of NMSGUT parameters and just 6 soft supersymmetry breaking parameters defined at the GUT scale. In this thesis we study the phenomenology of NMSGUT, its implications for inflationary and Cold Dark matter cosmology and develop Renormalization group(RG) equations for the flow of NMSGUT couplings in the extreme ultraviolet. In the first part we show that superheavy threshold effects can drastically lower the SO(10) yukawa couplings required for realistic unification and this cures the long standing problem of fast proton decay in Susy GUT. Then we propose a novel Supersymmetric Seesaw inflection(SSI) scenario based upon a SU(2)_L x U(1)_R x U(1)_{B-L} invariant model, where the inflation mass is controlled by the large conjugate sneutrino mass. We show that it is much less fine-tuned and more stable than Dirac sneutrino based MSSM inflation. NMSGUT can embed SSI, and even provide a large tensor scalar ratio, but obstacles in achieving enough inflation remain. The NMSGUT Bino LSP is a good dark matter candidate when it can co-annihilate with a nearly degenerate sfermion as in fits with a light smuon. We also calculate two loop NMSGUT gauge-Yukawa Renormalization Group(RG) beta functions and show that GUT scale negative Higgs mass squared parameters required by NMSGUT fits can arise by RG flows from positive values at the Planck scale. △ Less

Submitted 4 September, 2015; v1 submitted 17 June, 2015; originally announced June 2015.

Comments: Ph.D Thesis (Defended on 1 April, 2015), 187 Pages, 16 Figures, References added

arXiv:1504.07725 [pdf, ps, other]

doi 10.1016/j.physletb.2015.10.011

No scale SUGRA SO(10) derived Starobinsky Model of Inflation

Authors: Ila Garg, Subhendra Mohanty

Abstract: We show that a supersymmetric renormalizable theory based on gauge group SO(10) and Higgs system {\bf {10 $\oplus$ 210 $\oplus$ 126 $\oplus$ $\overline{\bf 126}$}} with no scale supergravity can lead to a Starobinsky kind of potential for inflation. Successful inflation is possible in the cases where the potential during inflation corresponds to… ▽ More We show that a supersymmetric renormalizable theory based on gauge group SO(10) and Higgs system {\bf {10 $\oplus$ 210 $\oplus$ 126 $\oplus$ $\overline{\bf 126}$}} with no scale supergravity can lead to a Starobinsky kind of potential for inflation. Successful inflation is possible in the cases where the potential during inflation corresponds to $SU(3)_C \times SU(2)_L \times SU(2)_R \times U(1)_{B-L}$, $SU(5)\times U(1)$ and flipped $SU(5)\times U(1)$ intermediate symmetry with a suitable choice of superpotential parameters. The reheating in such a scenario can occur via non perturbative decay of inflaton i.e. through "preheating". After the end of reheating, when universe cools down, the finite temperature potential can have a minimum which corresponds to MSSM. △ Less

Submitted 6 October, 2015; v1 submitted 29 April, 2015; originally announced April 2015.

Comments: 6 pages, 2 figures, Replaced with version to appear in Phys Lett B

arXiv:1311.6100 [pdf, ps, other]

doi 10.1016/j.nuclphysb.2014.03.003

Baryon Stability on the Higgs Dissolution Edge : Threshold corrections and suppression of Baryon violation in the NMSGUT

Authors: Charanjit S. Aulakh, Ila Garg, Charanjit K. Khosa

Abstract: Superheavy threshold corrections to the matching condition between matter Yukawa couplings of the effective Minimal Supersymmetric Standard Model (MSSM) and the New Minimal Supersymmetric (SO(10)) GUT(NMSGUT) provide a novel and generic mechanism for reducing the long standing and generically problematic operator dimension 5 Baryon decay rates. In suitable regions of the parameter space strong wav… ▽ More Superheavy threshold corrections to the matching condition between matter Yukawa couplings of the effective Minimal Supersymmetric Standard Model (MSSM) and the New Minimal Supersymmetric (SO(10)) GUT(NMSGUT) provide a novel and generic mechanism for reducing the long standing and generically problematic operator dimension 5 Baryon decay rates. In suitable regions of the parameter space strong wave function renormalization of the effective MSSM Higgs doublets due to the large number of heavy fields can take the wave function renormalization of the MSSM Higgs field close to the dissolution value ($Z_{H,\overline{H}}=0$). Rescaling to canonical kinetic terms lowers the SO(10) Yukawas required to match the MSSM fermion data. Since the same Yukawas determine the dimension 5 B violation operator coefficients, the associated rates can be suppressed to levels compatible with current limits. Including these threshold effects also relaxes the constraint $ y_b-y_τ\simeq y_s-y_μ$ operative between $\textbf{10} -\textbf{120} $ plet generated tree level MSSM matter fermion Yukawas $y_f$. We exhibit accurate fits of the MSSM fermion mass-mixing data in terms of NMSGUT superpotential couplings and 5 independent soft Susy breaking parameters specified at $10^{16.25}\,$ GeV with the claimed suppression of Baryon decay rates. As before, our s-spectra are of the mini split supersymmetry type with large $|A_0|,μ,m_{H,\overline H} > 100\,\,$ TeV, light gauginos and normal s-hierarchy. Large $A_0,μ$ and soft masses allow significant deviation from the canonical GUT gaugino mass ratios and ensure vacuum safety. Even without optimization, prominent candidates for BSM discovery such as the muon magnetic anomaly, $b\rightarrow sγ$ and Lepto-genesis CP violation emerge in the preferred ball park. △ Less

Submitted 25 March, 2014; v1 submitted 24 November, 2013; originally announced November 2013.

Comments: PdfLatex. 50 pages. Version accepted for publication in Nuclear Phys.B(2014). Available online at http://dx.doi.org/10.1016/j.nuclphysb.2014.03.003. arXiv admin note: substantial text overlap with arXiv:1107.2963

Journal ref: Nuclear Physics B882(2014) 397

arXiv:1201.0519 [pdf, ps, other]

doi 10.1103/PhysRevD.86.065001

Supersymmetric Seesaw Inflation

Authors: Charanjit S. Aulakh, Ila Garg

Abstract: Supersymmetric Unified theories which incorporate a renormalizable Type I seesaw mechanism for small neutrino masses can also provide slow roll inflection point inflation along a flat direction associated with a gauge invariant combination of the Higgs, slepton and right handed sneutrino superfields. Inflationary parameters are related to the Majorana and Dirac couplings responsible for neutrino m… ▽ More Supersymmetric Unified theories which incorporate a renormalizable Type I seesaw mechanism for small neutrino masses can also provide slow roll inflection point inflation along a flat direction associated with a gauge invariant combination of the Higgs, slepton and right handed sneutrino superfields. Inflationary parameters are related to the Majorana and Dirac couplings responsible for neutrino masses with the scale of inflation set by a right-handed neutrino mass $M_{ν^c} \sim 10^6-10^{12}$ GeV. Tuning of the neutrino Dirac and Majorana superpotential couplings and soft Susy breaking parameters is required to enforce flatness of the inflationary potential. In contrast to previous inflection point inflation models the cubic term is dominantly derived from superpotential couplings rather than soft A-terms. Thus since $M_{ν^c}>>M_{Susy}$ the tuning condition is almost independent of the soft supersymmetry breaking parameters and therefore more stable. The required fine tuning is also less stringent than for Minimal SUSY Standard Model (MSSM) inflation or Dirac neutrino 'A-term" inflation scenarios due to the much larger value of the inflaton mass. Reheating proceeds via `instant preheating' which rapidly dumps all the inflaton energy into a MSSM mode radiation bath giving a high reheat temperature $T_{rh} \approx M_{ν^c}^{3/4}\, 10^{6}$ GeV $\sim 10^{11}- 10^{15} $ GeV. Thus our scenario requires large gravitino mass $> 50 $ TeV to avoid a gravitino problem. The `instant preheating' and Higgs component of the inflaton also imply a `non-thermal' contribution to Leptogenesis due to facilitated production of right handed neutrinos during inflaton decay. We derive the tuning conditions for the scenario to work in the realistic New Minimal Supersymmetric SO(10) GUT and show that they can be satisfied by realistic fits. △ Less

Submitted 20 September, 2012; v1 submitted 2 January, 2012; originally announced January 2012.

Comments: Version published in Phys. Rev. D 86,065001 (2012), Sept. 15, 2012

Journal ref: Phys. Rev. D 86, 065001 (2012)

arXiv:1103.5601 [pdf, ps, other]

Non-linear interaction in random matrix models of RNA

Authors: Itty Garg, Pradeep Bhadola, N. Deo

Abstract: A non-linear Penner type interaction is introduced and studied in the random matrix model of homo-RNA. The asymptotics in length of the partition function is discussed for small and large $N$ (size of matrix). The interaction doubles the coupling ($v$) between the bases and the dependence of the combinatoric factor on ($v,N$) is found. For small $N$, the effect of interaction changes the power law… ▽ More A non-linear Penner type interaction is introduced and studied in the random matrix model of homo-RNA. The asymptotics in length of the partition function is discussed for small and large $N$ (size of matrix). The interaction doubles the coupling ($v$) between the bases and the dependence of the combinatoric factor on ($v,N$) is found. For small $N$, the effect of interaction changes the power law exponents for the secondary and tertiary structures. The specific heat shows different analytical behavior in the two regions of $N$, with a peculiar double peak in its second derivative for N=1 at low temperature. Tapping the model indicates the presence of multiple solutions. △ Less

Submitted 29 March, 2011; originally announced March 2011.

Comments: 7 pages, 2 figures, 1 table

arXiv:0911.3710 [pdf, ps, other]

Scaling, phase transition and genus distribution functions in matrix models of RNA with linear external interactions

Authors: I. Garg, N. Deo

Abstract: A linear external perturbation is introduced in the action of the partition function of the random matrix model of RNA [G. Vernizzi, H. Orland and A. Zee, Phys. Rev. Lett. 94, 168103 (2005)]. It is seen that (i). the perturbation distinguishes between paired and unpaired bases in that there are structural changes, from unpaired and paired base structures ($0 \leq α< 1$) to completely paired base… ▽ More A linear external perturbation is introduced in the action of the partition function of the random matrix model of RNA [G. Vernizzi, H. Orland and A. Zee, Phys. Rev. Lett. 94, 168103 (2005)]. It is seen that (i). the perturbation distinguishes between paired and unpaired bases in that there are structural changes, from unpaired and paired base structures ($0 \leq α< 1$) to completely paired base structures ($α=1$), as the perturbation parameter $α$ approaches 1 ($α$ is the ratio of interaction strengths of original and perturbed terms in the action of the partition function), (ii). the genus distributions exhibit small differences for small even and odd lengths $L$, (iii). the partition function of the linear interacting matrix model is related via a scaling formula to the re-scaled partition function of the random matrix model of RNA, (iv). the free energy and specific heat are plotted as functions of $L$, $α$ and temperature $T$ and their first derivative with respect to $α$ is plotted as a function of $α$. The free energy shows a phase transition at $α=1$ for odd (both small and large) lengths and for even lengths the transition at $α=1$ gets sharper and sharper as more pseudoknots are included (that is for large lengths). △ Less

Submitted 19 November, 2009; originally announced November 2009.

Comments: 20 pages, 22 figures, 1 table

arXiv:0908.3412 [pdf, ps, other]

Nitrogen clusters inside C60 cage and new nanoscale energetic materials

Authors: Hitesh Sharma, Isha Garg, Keya Dharamvir, V. K. Jindal

Abstract: We explore the possibility to trap polynitrogen clusters inside C60 fullerene cage, opening a new direction of developing nitrogen-rich high energy materials. We found that a maximum of 13 nitrogen atoms can be encapsulated in a C60 cage. The nitrogen clusters in confinement exhibit unique stable structures in polymeric form which possess a large component of (~ 70-80%) single bond character. Th… ▽ More We explore the possibility to trap polynitrogen clusters inside C60 fullerene cage, opening a new direction of developing nitrogen-rich high energy materials. We found that a maximum of 13 nitrogen atoms can be encapsulated in a C60 cage. The nitrogen clusters in confinement exhibit unique stable structures in polymeric form which possess a large component of (~ 70-80%) single bond character. The Nn@C60 molecules retain their structure at 300K for n<12. The Mulliken charge analysis shows very small charge transfer in N@C60, consistent with the quartet spin state of N. However, for 2<n<10, charge transfer take place from cage surface to Nn compounds and inverse polarization thereafter. These nitrogen clusters when allowed to relax to N2 molecules which are triply bonded are capable of releasing a large amount of energy. △ Less

Submitted 24 August, 2009; originally announced August 2009.

Comments: 25 pages Submitted to Carbon

arXiv:0809.1016 [pdf, ps, other]

doi 10.1103/PhysRevE.79.061903

RNA matrix models with external interactions and their asymptotic behaviour

Authors: I. Garg, N. Deo

Abstract: We study a matrix model of RNA in which an external perturbation acts on n nucleotides of the polymer chain. The effect of the perturbation appears in the exponential generating function of the partition function as a factor $(1-\frac{nα}{L})$ [where $α$ is the ratio of strengths of the original to the perturbed term and L is length of the chain]. The asymptotic behaviour of the genus distributi… ▽ More We study a matrix model of RNA in which an external perturbation acts on n nucleotides of the polymer chain. The effect of the perturbation appears in the exponential generating function of the partition function as a factor $(1-\frac{nα}{L})$ [where $α$ is the ratio of strengths of the original to the perturbed term and L is length of the chain]. The asymptotic behaviour of the genus distribution functions for the extended matrix model are analyzed numerically when (i) $n=L$ and (ii) $n=1$. In these matrix models of RNA, as $nα/L$ is increased from 0 to 1, it is found that the universality of the number of diagrams $a_{L, g}$ at a fixed length L and genus g changes from $3^{L}$ to $(3-\frac{nα}{L})^{L}$ ($2^{L}$ when $nα/L=1$) and the asymptotic expression of the total number of diagrams $\cal N$ at a fixed length L but independent of genus g, changes in the factor $\exp^{\sqrt{L}}$ to $\exp^{(1-\frac{nα}{L})\sqrt{L}}$ ($exp^{0}=1$ when $nα/L=1$) △ Less

Submitted 5 September, 2008; originally announced September 2008.

Comments: 9 pages, 5 figures, 2 tables

arXiv:0802.2440 [pdf, ps, other]

Genus Distributions For Extended Matrix Models Of RNA

Authors: Itty Garg, N. Deo

Abstract: We construct and study an extended random matrix model of RNA (polymer) folding. A perturbation which acts on all the nucleotides in the chain is added to the action of the RNA partition function. The effect of this perturbation on the partition function and the Genus Distributions is studied. This perturbation distinguishes between the paired and unpaired bases. For example, for $α= 1$ (where… ▽ More We construct and study an extended random matrix model of RNA (polymer) folding. A perturbation which acts on all the nucleotides in the chain is added to the action of the RNA partition function. The effect of this perturbation on the partition function and the Genus Distributions is studied. This perturbation distinguishes between the paired and unpaired bases. For example, for $α= 1$ (where $α$ is the ratio of the strengths of the original and perturbed term in the action) the partition function and genus distribution for odd lengths vanish completely. This partition function and the genus distribution is non-zero for even lengths where structures with fully paired bases only remain. This implies that (i). the genus distributions are different and (ii). there is a ``structural transition'' (from an ``unpaired-paired base phase'' to a ``completely paired base phase'') as $α$ approaches 1 in the extended matrix models. We compare the results of the extended RNA model with the results of G. Vernizzi, H. Orland and A. Zee in PRL 94, 168103(2005). △ Less

Submitted 23 April, 2008; v1 submitted 18 February, 2008; originally announced February 2008.

Comments: 15 pages, 4 figures, 3 tables

Showing 1–27 of 27 results for author: Garg, I