-
Activity-enhanced shear thinning of flexible linear polar polymers
Authors:
Arindam Panda,
Roland G. Winkler,
Sunil P. Singh
Abstract:
The rheological properties of tangentially propelled flexible polymers under linear shear flow are studied by computer simulations and are compared with analytical calculations. We find a significant impact of the coupled nonequilibrium active and shear forces on the polymer characteristics. The polar activity enhances shear-induced stretching along the flow direction, shrinkage in the transverse…
▽ More
The rheological properties of tangentially propelled flexible polymers under linear shear flow are studied by computer simulations and are compared with analytical calculations. We find a significant impact of the coupled nonequilibrium active and shear forces on the polymer characteristics. The polar activity enhances shear-induced stretching along the flow direction, shrinkage in the transverse direction, and implies a strongly amplified shear-thinning behavior. The characteristic shear rate for the onset of these effects is determined by the activity. In the asymptotic limit of large activities, the shear-induced features become independent of activity, and for asymptotically large shear rates, shear dominates over activity with passive polymer behavior.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Accelerating Neural Network Training Along Sharp and Flat Directions
Authors:
Daniyar Zakarin,
Sidak Pal Singh
Abstract:
Recent work has highlighted a surprising alignment between gradients and the top eigenspace of the Hessian -- termed the Dominant subspace -- during neural network training. Concurrently, there has been growing interest in the distinct roles of sharp and flat directions in the Hessian spectrum. In this work, we study Bulk-SGD, a variant of SGD that restricts updates to the orthogonal complement of…
▽ More
Recent work has highlighted a surprising alignment between gradients and the top eigenspace of the Hessian -- termed the Dominant subspace -- during neural network training. Concurrently, there has been growing interest in the distinct roles of sharp and flat directions in the Hessian spectrum. In this work, we study Bulk-SGD, a variant of SGD that restricts updates to the orthogonal complement of the Dominant subspace. Through ablation studies, we characterize the stability properties of Bulk-SGD and identify critical hyperparameters that govern its behavior. We show that updates along the Bulk subspace, corresponding to flatter directions in the loss landscape, can accelerate convergence but may compromise stability. To balance these effects, we introduce interpolated gradient methods that unify SGD, Dom-SGD, and Bulk-SGD. Finally, we empirically connect this subspace decomposition to the Generalized Gauss-Newton and Functional Hessian terms, showing that curvature energy is largely concentrated in the Dominant subspace. Our findings suggest a principled approach to designing curvature-aware optimizers.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Impedance and Stability Targeted Adaptation for Aerial Manipulator with Unknown Coupling Dynamics
Authors:
Amitabh Sharma,
Saksham Gupta,
Shivansh Pratap Singh,
Rishabh Dev Yadav,
Hongyu Song,
Wei Pan,
Spandan Roy,
Simone Baldi
Abstract:
Stable aerial manipulation during dynamic tasks such as object catching, perching, or contact with rigid surfaces necessarily requires compliant behavior, which is often achieved via impedance control. Successful manipulation depends on how effectively the impedance control can tackle the unavoidable coupling forces between the aerial vehicle and the manipulator. However, the existing impedance co…
▽ More
Stable aerial manipulation during dynamic tasks such as object catching, perching, or contact with rigid surfaces necessarily requires compliant behavior, which is often achieved via impedance control. Successful manipulation depends on how effectively the impedance control can tackle the unavoidable coupling forces between the aerial vehicle and the manipulator. However, the existing impedance controllers for aerial manipulator either ignore these coupling forces (in partitioned system compliance methods) or require their precise knowledge (in complete system compliance methods). Unfortunately, such forces are very difficult to model, if at all possible. To solve this long-standing control challenge, we introduce an impedance controller for aerial manipulator which does not rely on a priori knowledge of the system dynamics and of the coupling forces. The impedance control design can address unknown coupling forces, along with system parametric uncertainties, via suitably designed adaptive laws. The closed-loop system stability is proved analytically and experimental results with a payload-catching scenario demonstrate significant improvements in overall stability and tracking over the state-of-the-art impedance controllers using either partitioned or complete system compliance.
△ Less
Submitted 29 March, 2025;
originally announced April 2025.
-
Weed Detection using Convolutional Neural Network
Authors:
Santosh Kumar Tripathi,
Shivendra Pratap Singh,
Devansh Sharma,
Harshavardhan U Patekar
Abstract:
In this paper we use convolutional neural networks (CNNs) for weed detection in agricultural land. We specifically investigate the application of two CNN layer types, Conv2d and dilated Conv2d, for weed detection in crop fields. The suggested method extracts features from the input photos using pre-trained models, which are subsequently adjusted for weed detection. The findings of the experiment,…
▽ More
In this paper we use convolutional neural networks (CNNs) for weed detection in agricultural land. We specifically investigate the application of two CNN layer types, Conv2d and dilated Conv2d, for weed detection in crop fields. The suggested method extracts features from the input photos using pre-trained models, which are subsequently adjusted for weed detection. The findings of the experiment, which used a sizable collection of dataset consisting of 15336 segments, being 3249 of soil, 7376 of soybean, 3520 grass and 1191 of broadleaf weeds. show that the suggested approach can accurately and successfully detect weeds at an accuracy of 94%. This study has significant ramifications for lowering the usage of toxic herbicides and increasing the effectiveness of weed management in agriculture.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Avoiding spurious sharpness minimization broadens applicability of SAM
Authors:
Sidak Pal Singh,
Hossein Mobahi,
Atish Agarwala,
Yann Dauphin
Abstract:
Curvature regularization techniques like Sharpness Aware Minimization (SAM) have shown great promise in improving generalization on vision tasks. However, we find that SAM performs poorly in domains like natural language processing (NLP), often degrading performance -- even with twice the compute budget. We investigate the discrepancy across domains and find that in the NLP setting, SAM is dominat…
▽ More
Curvature regularization techniques like Sharpness Aware Minimization (SAM) have shown great promise in improving generalization on vision tasks. However, we find that SAM performs poorly in domains like natural language processing (NLP), often degrading performance -- even with twice the compute budget. We investigate the discrepancy across domains and find that in the NLP setting, SAM is dominated by regularization of the logit statistics -- instead of improving the geometry of the function itself. We use this observation to develop an alternative algorithm we call Functional-SAM, which regularizes curvature only through modification of the statistics of the overall function implemented by the neural network, and avoids spurious minimization through logit manipulation. Furthermore, we argue that preconditioning the SAM perturbation also prevents spurious minimization, and when combined with Functional-SAM, it gives further improvements. Our proposed algorithms show improved performance over AdamW and SAM baselines when trained for an equal number of steps, in both fixed-length and Chinchilla-style training settings, at various model scales (including billion-parameter scale). On the whole, our work highlights the importance of more precise characterizations of sharpness in broadening the applicability of curvature regularization to large language models (LLMs).
△ Less
Submitted 4 February, 2025;
originally announced February 2025.
-
Heralded generation of entanglement with photons
Authors:
Imogen Forbes,
Farzad Ghafari,
Edward C. R. Deacon,
Sukhjit P. Singh,
Emilien Lavie,
Patrick Yard,
Reece D. Shaw,
Anthony Laing,
Nora Tischler
Abstract:
Entangled states of photons form the backbone of many quantum technologies. Due to the lack of effective photon-photon interactions, the generation of these states is typically probabilistic. In the prevailing but fundamentally limited generation technique, known as postselection, the target photons are measured destructively in the generation process. By contrast, in the alternative approach -- h…
▽ More
Entangled states of photons form the backbone of many quantum technologies. Due to the lack of effective photon-photon interactions, the generation of these states is typically probabilistic. In the prevailing but fundamentally limited generation technique, known as postselection, the target photons are measured destructively in the generation process. By contrast, in the alternative approach -- heralded state generation -- the successful creation of a desired state is verified by the detection of ancillary photons. Heralded state generation is superior to postselection in several critical ways: It enables free usage of the prepared states, allows for the success probability to be arbitrarily increased via multiplexing, and provides a scalable route to quantum information processing using photons. Here, we review theoretical proposals and experimental realizations of heralded entangled photonic state generation, as well as the impact of realistic experimental errors. We then discuss the wide-ranging applications of these states for quantum technologies, including resource states in linear optical quantum computing, entanglement swapping for repeater networks, fundamental physics, and quantum metrology.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Authors:
Jim Zhao,
Sidak Pal Singh,
Aurelien Lucchi
Abstract:
The Gauss-Newton (GN) matrix plays an important role in machine learning, most evident in its use as a preconditioning matrix for a wide family of popular adaptive methods to speed up optimization. Besides, it can also provide key insights into the optimization landscape of neural networks. In the context of deep neural networks, understanding the GN matrix involves studying the interaction betwee…
▽ More
The Gauss-Newton (GN) matrix plays an important role in machine learning, most evident in its use as a preconditioning matrix for a wide family of popular adaptive methods to speed up optimization. Besides, it can also provide key insights into the optimization landscape of neural networks. In the context of deep neural networks, understanding the GN matrix involves studying the interaction between different weight matrices as well as the dependencies introduced by the data, thus rendering its analysis challenging. In this work, we take a first step towards theoretically characterizing the conditioning of the GN matrix in neural networks. We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width, which we also extend to two-layer ReLU networks. We expand the analysis to further architectural components, such as residual connections and convolutional layers. Finally, we empirically validate the bounds and uncover valuable insights into the influence of the analyzed architectural components.
△ Less
Submitted 27 February, 2025; v1 submitted 4 November, 2024;
originally announced November 2024.
-
Controlling electronic properties of hexagonal manganites through aliovalent doping and thermoatmospheric history
Authors:
Didrik R. Småbråten,
Frida H. Danmo,
Nikolai H. Gaukås,
Sathya P. Singh,
Nikola Kanas,
Dennis Meier,
Kjell Wiik,
Mari-Ann Einarsrud,
Sverre M. Selbach
Abstract:
The family of hexagonal manganites is intensively studied for its multiferroicity, magnetoelectric coupling, improper ferroelectricity, functional domain walls, and topology-related scaling behaviors. It is established that these physical properties are co-determined by the cation sublattices and that aliovalent doping can readily be leveraged to modify them. The doping, however, also impacts the…
▽ More
The family of hexagonal manganites is intensively studied for its multiferroicity, magnetoelectric coupling, improper ferroelectricity, functional domain walls, and topology-related scaling behaviors. It is established that these physical properties are co-determined by the cation sublattices and that aliovalent doping can readily be leveraged to modify them. The doping, however, also impacts the anion defect chemistry and semiconducting properties, which makes the system highly sensitive to the synthesis and processing conditions. Here, we study the electronic properties of YMnO3 as function of aliovalent cation doping and thermoatmospheric history, combining density functional theory calculations with thermopower and thermogravimetric measurements. We show that the charge carrier concentration and transport properties can be controlled via both aliovalent cation dopants and anion defects, enabling reversible switching between n-type and p-type conductivity. This tunability is of importance for envisaged applications of hexagonal manganites in, e.g. next-generation capacitors and domain-wall nanoelectronics, or as catalysts or electrodes in fuel cells or electrolyzers. Furthermore, our approach is transferrable to other transition metal oxides, providing general guidelines for controlling their semiconducting properties.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Authors:
Weronika Ormaniec,
Felix Dangel,
Sidak Pal Singh
Abstract:
The Transformer architecture has inarguably revolutionized deep learning, overtaking classical architectures like multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs). At its core, the attention block differs in form and functionality from most other architectural components in deep learning--to the extent that, in comparison to MLPs/CNNs, Transformers are more often accompanied…
▽ More
The Transformer architecture has inarguably revolutionized deep learning, overtaking classical architectures like multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs). At its core, the attention block differs in form and functionality from most other architectural components in deep learning--to the extent that, in comparison to MLPs/CNNs, Transformers are more often accompanied by adaptive optimizers, layer normalization, learning rate warmup, etc. The root causes behind these outward manifestations and the precise mechanisms that govern them remain poorly understood. In this work, we bridge this gap by providing a fundamental understanding of what distinguishes the Transformer from the other architectures--grounded in a theoretical comparison of the (loss) Hessian. Concretely, for a single self-attention layer, (a) we first entirely derive the Transformer's Hessian and express it in matrix derivatives; (b) we then characterize it in terms of data, weight, and attention moment dependencies; and (c) while doing so further highlight the important structural differences to the Hessian of classical networks. Our results suggest that various common architectural and optimization choices in Transformers can be traced back to their highly non-linear dependencies on the data and weight matrices, which vary heterogeneously across parameters. Ultimately, our findings provide a deeper understanding of the Transformer's unique optimization landscape and the challenges it poses.
△ Less
Submitted 17 March, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
Artificial Intelligence-Based Opportunistic Coronary Calcium Screening in the Veterans Affairs National Healthcare System
Authors:
Raffi Hagopian,
Timothy Strebel,
Simon Bernatz,
Gregory A Myers,
Erik Offerman,
Eric Zuniga,
Cy Y Kim,
Angie T Ng,
James A Iwaz,
Sunny P Singh,
Evan P Carey,
Michael J Kim,
R Spencer Schaefer,
Jeannie Yu,
Amilcare Gentili,
Hugo JWL Aerts
Abstract:
Coronary artery calcium (CAC) is highly predictive of cardiovascular events. While millions of chest CT scans are performed annually in the United States, CAC is not routinely quantified from scans done for non-cardiac purposes. A deep learning algorithm was developed using 446 expert segmentations to automatically quantify CAC on non-contrast, non-gated CT scans (AI-CAC). Our study differs from p…
▽ More
Coronary artery calcium (CAC) is highly predictive of cardiovascular events. While millions of chest CT scans are performed annually in the United States, CAC is not routinely quantified from scans done for non-cardiac purposes. A deep learning algorithm was developed using 446 expert segmentations to automatically quantify CAC on non-contrast, non-gated CT scans (AI-CAC). Our study differs from prior works as we leverage imaging data across the Veterans Affairs national healthcare system, from 98 medical centers, capturing extensive heterogeneity in imaging protocols, scanners, and patients. AI-CAC performance on non-gated scans was compared against clinical standard ECG-gated CAC scoring. Non-gated AI-CAC differentiated zero vs. non-zero and less than 100 vs. 100 or greater Agatston scores with accuracies of 89.4% (F1 0.93) and 87.3% (F1 0.89), respectively, in 795 patients with paired gated scans within a year of a non-gated CT scan. Non-gated AI-CAC was predictive of 10-year all-cause mortality (CAC 0 vs. >400 group: 25.4% vs. 60.2%, Cox HR 3.49, p < 0.005), and composite first-time stroke, MI, or death (CAC 0 vs. >400 group: 33.5% vs. 63.8%, Cox HR 3.00, p < 0.005). In a screening dataset of 8,052 patients with low-dose lung cancer-screening CTs (LDCT), 3,091/8,052 (38.4%) individuals had AI-CAC >400. Four cardiologists qualitatively reviewed LDCT images from a random sample of >400 AI-CAC patients and verified that 527/531 (99.2%) would benefit from lipid-lowering therapy. To the best of our knowledge, this is the first non-gated CT CAC algorithm developed across a national healthcare system, on multiple imaging protocols, without filtering intra-cardiac hardware, and compared against a strong gated CT reference. We report superior performance relative to previous CAC algorithms evaluated against paired gated scans that included patients with intra-cardiac hardware.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Structural transitions of a Semi-Flexible Polyampholyte
Authors:
Rakesh Palariya,
Sunil P. Singh
Abstract:
Polyampholytes (PA) are charged polymers composed of positively and negatively charged monomers along their backbone. The sequence of the charged monomers and the bending of the chain significantly influence the conformation and dynamical behavior of the PA. Using coarse-grained molecular dynamics simulations, we comprehensively study the structural and dynamical properties of flexible and semi-fl…
▽ More
Polyampholytes (PA) are charged polymers composed of positively and negatively charged monomers along their backbone. The sequence of the charged monomers and the bending of the chain significantly influence the conformation and dynamical behavior of the PA. Using coarse-grained molecular dynamics simulations, we comprehensively study the structural and dynamical properties of flexible and semi-flexible polyampholytes'. The simulation results demonstrate a flexible polyampholyte (PA) chain, displaying a transition from a coil to a globule in the parameter space of the charge sequence. Additionally, the behavior of the mean-square displacement (MSD), denoted as $<(Δr(t))^2>$, reveals distinct dynamics, specifically for the alternating and charge-segregated sequences. The MSD follows a power-law behavior, where $<(Δr(t))^2> \sim t^β$, with $β\approx 3/5$ and $β\approx 1/2$ for the alternating sequence and charge-segregated sequence in the absence of hydrodynamic interactions, respectively. However, when hydrodynamic interactions are incorporated, the exponent $β$ shifts to approximately 3/5 for the charge-segregated sequence and 2/3 for the well-mixed alternating sequence. For a semi-flexible PA chain, varying the bending rigidity and electrostatic interaction strength ($Γ_e$) leads to distinct, fascinating conformational states, including globule, bundle, and torus-like conformations. We show that PA acquires circular and hairpin-like conformations in the intermediate bending regime. The transition between various conformations is identified in terms of the shape factor estimated from the ratios of eigenvalues of the gyration tensor.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Local vs Global continual learning
Authors:
Giulia Lanzillotta,
Sidak Pal Singh,
Benjamin F. Grewe,
Thomas Hofmann
Abstract:
Continual learning is the problem of integrating new information in a model while retaining the knowledge acquired in the past. Despite the tangible improvements achieved in recent years, the problem of continual learning is still an open one. A better understanding of the mechanisms behind the successes and failures of existing continual learning algorithms can unlock the development of new succe…
▽ More
Continual learning is the problem of integrating new information in a model while retaining the knowledge acquired in the past. Despite the tangible improvements achieved in recent years, the problem of continual learning is still an open one. A better understanding of the mechanisms behind the successes and failures of existing continual learning algorithms can unlock the development of new successful strategies. In this work, we view continual learning from the perspective of the multi-task loss approximation, and we compare two alternative strategies, namely local and global approximations. We classify existing continual learning algorithms based on the approximation used, and we assess the practical effects of this distinction in common continual learning settings.Additionally, we study optimal continual learning objectives in the case of local polynomial approximations and we provide examples of existing algorithms implementing the optimal objectives
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Active Polar Ring Polymer in Shear Flow -- An Analytical Study
Authors:
Roland G. Winkler,
Sunil P. Singh
Abstract:
We theoretically study the conformational and dynamical properties of semiflexible active polar ring polymers under linear shear flow. A ring is described as a continuous Gaussian polymer with a tangential active force of a constant density along its contour. The linear but non-Hermitian equation of motion is solved using an eigenfunction expansion, which yields activity-independent, but shear-rat…
▽ More
We theoretically study the conformational and dynamical properties of semiflexible active polar ring polymers under linear shear flow. A ring is described as a continuous Gaussian polymer with a tangential active force of a constant density along its contour. The linear but non-Hermitian equation of motion is solved using an eigenfunction expansion, which yields activity-independent, but shear-rate-dependent, relaxation times and activity-dependent frequencies. As a consequence, the ring's stationary-state properties are independent of activity, and its conformations as well as rheological properties are equal to those of a passive ring under shear. The presence of characteristic time scales by the relaxation and the frequency gives rise to a particular dynamical behavior. A tank-treading-like motion emerges for large relaxation times and high frequencies, specifically for stiffer rings, governed by the activity-dependent frequencies. In the case of very flexible polymers, the relaxation behavior dominates over tank-treading. Shear strongly affects the crossover from a tank-treading to a relaxation-time dominated dynamics and suppresses tank-treading. This is reflected in the tumbling frequency, which exhibits two shear-rate dependent regimes, with an activity-dependent plateau at low shear rates followed by a power-law regime with increasing tumbling frequency for large shear rates.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Landscaping Linear Mode Connectivity
Authors:
Sidak Pal Singh,
Linara Adilova,
Michael Kamp,
Asja Fischer,
Bernhard Schölkopf,
Thomas Hofmann
Abstract:
The presence of linear paths in parameter space between two different network solutions in certain cases, i.e., linear mode connectivity (LMC), has garnered interest from both theoretical and practical fronts. There has been significant research that either practically designs algorithms catered for connecting networks by adjusting for the permutation symmetries as well as some others that more th…
▽ More
The presence of linear paths in parameter space between two different network solutions in certain cases, i.e., linear mode connectivity (LMC), has garnered interest from both theoretical and practical fronts. There has been significant research that either practically designs algorithms catered for connecting networks by adjusting for the permutation symmetries as well as some others that more theoretically construct paths through which networks can be connected. Yet, the core reasons for the occurrence of LMC, when in fact it does occur, in the highly non-convex loss landscapes of neural networks are far from clear. In this work, we take a step towards understanding it by providing a model of how the loss landscape needs to behave topographically for LMC (or the lack thereof) to manifest. Concretely, we present a `mountainside and ridge' perspective that helps to neatly tie together different geometric features that can be spotted in the loss landscape along the training runs. We also complement this perspective by providing a theoretical analysis of the barrier height, for which we provide empirical support, and which additionally extends as a faithful predictor of layer-wise LMC. We close with a toy example that provides further intuition on how barriers arise in the first place, all in all, showcasing the larger aim of the work -- to provide a working model of the landscape and its topography for the occurrence of LMC.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Collective dynamics of active dumbbells near a circular obstacle
Authors:
Chandranshu Tiwari,
Sunil P. Singh
Abstract:
In this article, we present the collective dynamics of active dumbbells in the presence of a static circular obstacle using Brownian dynamics simulation. The active dumbbells aggregate on the surface of a circular obstacle beyond a critical radius. The aggregation is non-uniform along the circumference, and the aggregate size increases with the activity and the curvature radius. The dense aggregat…
▽ More
In this article, we present the collective dynamics of active dumbbells in the presence of a static circular obstacle using Brownian dynamics simulation. The active dumbbells aggregate on the surface of a circular obstacle beyond a critical radius. The aggregation is non-uniform along the circumference, and the aggregate size increases with the activity and the curvature radius. The dense aggregate of active dumbbells displays persistent rotational motion with a certain angular speed, which linearly increases with the activity. Further, we show the strong polar ordering of the active dumbbells within the aggregate. The polar ordering exhibits a long-range correlation, with the correlation length corresponding to the aggregate size. Additionally, we show that the residence time of an active dumbbell on the obstacle surface grows rapidly with area fraction due to many-body interactions that lead to a slowdown of the rotational diffusion. The article further considers the dynamical behavior of a tracer particle in the solution of active dumbbells. Interestingly, the speed of the passive tracer particle displays a crossover from monotonically decreasing to increasing with the tracer particle's size upon increasing the dumbbells' speed. Furthermore, the effective diffusion of the tracer particle displays the non-monotonic behavior with area fraction; the initial increase of the diffusivity is followed by a decrease for larger area fraction.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
The MESA Security Model 2.0: A Dynamic Framework for Mitigating Stealth Data Exfiltration
Authors:
Sanjeev Pratap Singh,
Naveed Afzal
Abstract:
The rising complexity of cyber threats calls for a comprehensive reassessment of current security frameworks in business environments. This research focuses on Stealth Data Exfiltration, a significant cyber threat characterized by covert infiltration, extended undetectability, and unauthorized dissemination of confidential data. Our findings reveal that conventional defense-in-depth strategies oft…
▽ More
The rising complexity of cyber threats calls for a comprehensive reassessment of current security frameworks in business environments. This research focuses on Stealth Data Exfiltration, a significant cyber threat characterized by covert infiltration, extended undetectability, and unauthorized dissemination of confidential data. Our findings reveal that conventional defense-in-depth strategies often fall short in combating these sophisticated threats, highlighting the immediate need for a shift in information risk management across businesses. The evolving nature of cyber threats, driven by advancements in techniques such as social engineering, multi-vector attacks, and Generative AI, underscores the need for robust, adaptable, and comprehensive security strategies. As we navigate this complex landscape, it is crucial to anticipate potential threats and continually update our defenses. We propose a shift from traditional perimeter-based, prevention-focused models, which depend on a static attack surface, to a more dynamic framework that prepares for inevitable breaches. This suggested model, known as MESA 2.0 Security Model, prioritizes swift detection, immediate response, and ongoing resilience, thereby enhancing an organizations ability to promptly identify and neutralize threats, significantly reducing the consequences of security breaches. This study suggests that businesses adopt a forward-thinking and adaptable approach to security management to stay ahead of the ever-changing cyber threat landscape.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Post Quantum Cryptography and its Comparison with Classical Cryptography
Authors:
Tanmay Tripathi,
Abhinav Awasthi,
Shaurya Pratap Singh,
Atul Chaturvedi
Abstract:
Cryptography plays a pivotal role in safeguarding sensitive information and facilitating secure communication. Classical cryptography relies on mathematical computations, whereas quantum cryptography operates on the principles of quantum mechanics, offering a new frontier in secure communication. Quantum cryptographic systems introduce novel dimensions to security, capable of detecting and thwarti…
▽ More
Cryptography plays a pivotal role in safeguarding sensitive information and facilitating secure communication. Classical cryptography relies on mathematical computations, whereas quantum cryptography operates on the principles of quantum mechanics, offering a new frontier in secure communication. Quantum cryptographic systems introduce novel dimensions to security, capable of detecting and thwarting eavesdropping attempts. By contrasting quantum cryptography with its classical counterpart, it becomes evident how quantum mechanics revolutionizes the landscape of secure communication.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Berezinskii-Kosterlitz-Thouless to BCS-like superconducting transition crossover driven by weak magnetic fields in ultra-thin NbN films
Authors:
Meenakshi Sharma,
Sergio Caprara,
Andrea Perali,
Surinder P. Singh,
Sandeep Singh,
Matteo Fretto,
Natascia De Leo,
Nicola Pinto
Abstract:
The Berezinskii-Kosterlitz-Thouless (BKT) transition in ultra-thin NbN films is investigated in the presence of weak perpendicular magnetic fields. A jump in the phase stiffness at the BKT transition is detected up to 5 G, while the BKT features are smeared between 5 G and 50 G, disappearing altogether at 100 G, where conventional current-voltage behaviour is observed. Our findings demonstrate tha…
▽ More
The Berezinskii-Kosterlitz-Thouless (BKT) transition in ultra-thin NbN films is investigated in the presence of weak perpendicular magnetic fields. A jump in the phase stiffness at the BKT transition is detected up to 5 G, while the BKT features are smeared between 5 G and 50 G, disappearing altogether at 100 G, where conventional current-voltage behaviour is observed. Our findings demonstrate that weak magnetic fields, insignificant in bulk systems, deeply affect our ultra-thin system, promoting a crossover from Halperin-Nelson fluctuations to a BCS-like state with Ginzburg-Landau fluctuations, as the field increases. This behavior is related to field-induced free vortices that screen the vortex-antivortex interaction and smear the BKT transition.
△ Less
Submitted 16 July, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
$Herschel$ investigation of cores and filamentary structures in L1251 located in the Cepheus flare
Authors:
Divyansh Dewan,
Archana Soam,
Guo-Yin Zhang,
Akhil Lasrado,
Saikhom Pravash Singh,
Chang Won Lee
Abstract:
Context: Molecular clouds are the prime locations of star formation. These clouds contain filamentary structures and cores which are crucial in the formation of young stars. Aims: In this work, we aim to quantify the physical properties of structural characteristics within the molecular cloud L1251 to better understand the initial conditions for star formation. Methods: We applied the getsf algori…
▽ More
Context: Molecular clouds are the prime locations of star formation. These clouds contain filamentary structures and cores which are crucial in the formation of young stars. Aims: In this work, we aim to quantify the physical properties of structural characteristics within the molecular cloud L1251 to better understand the initial conditions for star formation. Methods: We applied the getsf algorithm to identify cores and filaments within the molecular cloud L1251 using the Herschel multiband dust continuum image, enabling us to measure their respective physical properties. Additionally, we utilized an enhanced differential term algorithm to produce high-resolution temperature maps and column density maps with a resolution of ${13.5}''$. Results: We identified 122 cores in the region. Out of them, 23 are protostellar cores, 13 are robust prestellar cores, 32 are candidate prestellar cores (including 13 robust prestellar cores and 19 strictly candidate prestellar cores), and 67 are unbound starless cores. getsf also found 147 filament structures in the region. Statistical analysis of the physical properties (mass (M), temperature (T), size, and core brightness (hereafter, we are using the word luminosity (L)) for the core brightness) of obtained cores shows a negative correlation between core mass and temperature and a positive correlation between (M/L) and (M/T). Analysis of the filaments gives a median width of 0.14 pc and no correlation between width and length. Out of those 122 cores, 92 are present in filaments (75.4%) and the remaining were outside them. Out of the cores present in filaments, 57 (62%) cores are present in supercritical filaments ($M_{\rm line}>16M_{\odot }/{\rm pc}$).
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy
Authors:
Sidak Pal Singh,
Bobby He,
Thomas Hofmann,
Bernhard Schölkopf
Abstract:
We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which hallmark the directional nature of optimization in neural networks:…
▽ More
We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which hallmark the directional nature of optimization in neural networks: when is there redundancy, and when exploration. We use them to reveal the inherent nuance and interplay involved between various optimization choices, such as momentum and weight decay. Further, the trajectory perspective helps us see the effect of scale on regularizing the directional nature of trajectories, and as a by-product, we also observe an intriguing heterogeneity of Q,K,V dynamics in the middle attention layers in LLMs and which is homogenized by scale. Importantly, we put the significant directional redundancy observed to the test by demonstrating that training only scalar batchnorm parameters some while into training matches the performance of training the entire network, which thus exhibits the potential of hybrid optimization schemes that are geared towards efficiency.
△ Less
Submitted 24 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Investigation of the Thermal Structure in the Atmospheric Boundary Layer During Evening Transition and the Impact of Aerosols on Radiative Cooling
Authors:
Suryadev Pratap Singh,
Mohammad Rafiuddin,
Subham Banerjee,
Sreenivas K R
Abstract:
We have explored the evening transition using data from eighty days of observations across two fog seasons at the Kempegowda International Airport, Bengaluru (KIAB). Through field experiments and simulations integrating aerosol interaction in a radiation-conduction model, we elucidate the impact of aerosols on longwave cooling of the Atmospheric Boundary Layer (ABL). Field observations indicate th…
▽ More
We have explored the evening transition using data from eighty days of observations across two fog seasons at the Kempegowda International Airport, Bengaluru (KIAB). Through field experiments and simulations integrating aerosol interaction in a radiation-conduction model, we elucidate the impact of aerosols on longwave cooling of the Atmospheric Boundary Layer (ABL). Field observations indicate that under calm and clear-sky conditions, the evening transition typically results in a distinct vertical thermal structure called the Lifted Temperature Minimum (LTM). We observe that the prevailing profile near the surface post-sunset is the LTM-profile. Additionally, the occurrence of LTM is observed to increase with decreases in downward and upward longwave flux, soil sensible heat flux, wind speed, and turbulent kinetic energy measured at two meters above ground level (AGL). In such scenarios, the intensity of LTM-profiles is primarily governed by aerosol-induced longwave heating rate (LHR) within the surface layer. Furthermore, the presence of clouds leads to increased downward flux, causing the disappearance of LTM, whereas shallow fog can enhance LTM intensity, as observed in both field observations and simulations. Usually, prevailing radiation models underestimate aerosol-induced longwave heating rate (LHR) by an order, compared to actual field observations. We attribute this difference to aerosol-induced radiation divergence. We show that impact of aerosol-induced LHR extends hundreds of meters into the inversion layer, affecting temperature profiles and potentially influencing processes such as fog formation. As the fog layer develops, LHR strengthens at its upper boundary, however, we highlight the difficulty in detecting this cooling using remote instruments such as microwave radiometer.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Towards Meta-Pruning via Optimal Transport
Authors:
Alexander Theus,
Olin Geimer,
Friedrich Wicke,
Thomas Hofmann,
Sotiris Anagnostidis,
Sidak Pal Singh
Abstract:
Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importanc…
▽ More
Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importance metrics, Intra-Fusion redefines the overlying pruning procedure. Through utilizing the concepts of model fusion and Optimal Transport, we leverage an agnostically given importance metric to arrive at a more effective sparse model representation. Notably, our approach achieves substantial accuracy recovery without the need for resource-intensive fine-tuning, making it an efficient and promising tool for neural network compression.
Additionally, we explore how fusion can be added to the pruning process to significantly decrease the training time while maintaining competitive performance. We benchmark our results for various networks on commonly used datasets such as CIFAR-10, CIFAR-100, and ImageNet. More broadly, we hope that the proposed Intra-Fusion approach invigorates exploration into a fresh alternative to the predominant compression approaches. Our code is available here: https://github.com/alexandertheus/Intra-Fusion.
△ Less
Submitted 13 February, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Formation of nano and micro scale hierarchical structures in MgO and ZnO quantum dots doped LC media: The role of competitive forces
Authors:
A. K. Singh,
S. P. Singh
Abstract:
In this paper, we have studied the effect of doping of ZnO and MgO nanoparticles (NPs) in 4-(trans-4-n-hexylcyclo-hexyl) isothiocyanatobenzoate. A thorough comparison of dielectric properties, optoelectronic properties, and calorimetric phase transition properties has been done for MgO and ZnO NP doped LC. We prepare their homogenous mixture of MgO and ZnO NPs in toluene and transfer into cells ma…
▽ More
In this paper, we have studied the effect of doping of ZnO and MgO nanoparticles (NPs) in 4-(trans-4-n-hexylcyclo-hexyl) isothiocyanatobenzoate. A thorough comparison of dielectric properties, optoelectronic properties, and calorimetric phase transition properties has been done for MgO and ZnO NP doped LC. We prepare their homogenous mixture of MgO and ZnO NPs in toluene and transfer into cells made of glass and Indium Tin-Oxide (ITO) coated glass. The observed microstructures in the hybrid system can be classified into three main categories: grain like structures formed by aggregation of smaller size MgO nanoparticles while liquid crystal molecules anchor over the surfaces of nanoparticles, the grtu grain-like structures further integrate to form inorganic polymeric type of honeycomb-like mesostructures in presence of glass surface, and flower-like clusters of MgO nanoparticles on ITO surface. The smaller size nanoparticles can maintain the energy balance by allowing the anchoring of liquid crystal molecules over their surfaces whereas the larger size nanoparticles cannot compromise or maintain the energy balance with the liquid crystal molecules and are separated out to nucleate and form bigger size nanoaggregate or clusters. The energy preference of the substrate and nanoparticle's surface to liquid crystal molecules plays an important role in the formation of different types of hierarchical nano- and microstructures. We account the reasons for the formation of nano and micro scale hierarchical structures on the basis of the competition between the forces: NP-NP, LC-LC, NP-LC, Glass/ITO-NP, and Glass/ITO-LC interactions. We observed a considerable change in the dielectric properties, transition temperature, bandgap, and other parameters of LC molecules when MgO NPs are doped, but a minor change occurs when ZnO NPs are doped in LC.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Characteristic features of self-avoiding active Brownian polymers under linear shear flow
Authors:
Arindam Panda,
Roland G. Winkler,
Sunil P. Singh
Abstract:
We present Brownian dynamics simulation results of a flexible linear polymer with excluded-volume interactions under shear flow in the presence of active noise. The active noise strongly affects the polymer's conformational and dynamical properties, such as the stretching in the flow direction and compression in the gradient direction, shear-induced alignment, and shear viscosity. In the asymptoti…
▽ More
We present Brownian dynamics simulation results of a flexible linear polymer with excluded-volume interactions under shear flow in the presence of active noise. The active noise strongly affects the polymer's conformational and dynamical properties, such as the stretching in the flow direction and compression in the gradient direction, shear-induced alignment, and shear viscosity. In the asymptotic limit of large activities and shear rates, the power-law scaling exponents of these quantities differ significantly from those of passive polymers. The chain's shear-induced stretching at a given shear rate is reduced by active noise, and it displays a non-monotonic behavior, where an initial polymer compression is followed by its stretching with increasing active force.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
Authors:
Vukasin Bozic,
Danilo Dordevic,
Daniele Coppola,
Joseph Thommes,
Sidak Pal Singh
Abstract:
This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via kn…
▽ More
This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these "attentionless Transformers" to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.
△ Less
Submitted 4 February, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
P wave mesons emitting weak decays of bottom mesons
Authors:
Maninder Kaur,
Supreet Pal Singh,
R C Verma
Abstract:
This paper is the extension of our previous work entitled Searching a systematics for nonfactorizable contributions to and hadronic decays. Obtaining the factorizable contributions from the spectator quark model for a systematics has been identified among the isospin reduced amplitudes for the nonfactorizable terms among decay modes. This systematics helps us to derive a generic formula which assi…
▽ More
This paper is the extension of our previous work entitled Searching a systematics for nonfactorizable contributions to and hadronic decays. Obtaining the factorizable contributions from the spectator quark model for a systematics has been identified among the isospin reduced amplitudes for the nonfactorizable terms among decay modes. This systematics helps us to derive a generic formula which assists to predict the branching fractions for Inspired by this observation, we extend our analysis to p wave meson emitting decays of which have similar isospin structure and make predictions for where the experimental measurements are not yet available.
△ Less
Submitted 2 December, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Controlled dissipation for Rydberg atom experiments
Authors:
Bleuenn Bégoc,
Giovanni Cichelli,
Sukhjit P. Singh,
Fabio Bensch,
Valerio Amico,
Francesco Perciavalle,
Davide Rossini,
Luigi Amico,
Oliver Morsch
Abstract:
We demonstrate a simple technique for adding controlled dissipation to Rydberg atom experiments. In our experiments we excite cold rubidium atoms in a magneto-optical trap to $70$-S Rydberg states, whilst simultaneously inducing forced dissipation by resonantly coupling the Rydberg state to a hyperfine level of the short-lived $6$-P state. The resulting effective dissipation can be varied in stren…
▽ More
We demonstrate a simple technique for adding controlled dissipation to Rydberg atom experiments. In our experiments we excite cold rubidium atoms in a magneto-optical trap to $70$-S Rydberg states, whilst simultaneously inducing forced dissipation by resonantly coupling the Rydberg state to a hyperfine level of the short-lived $6$-P state. The resulting effective dissipation can be varied in strength and switched on and off during a single experimental cycle.
△ Less
Submitted 25 October, 2024; v1 submitted 31 October, 2023;
originally announced October 2023.
-
Strategy Revision Phase with Payoff Threshold in the Public Goods Game
Authors:
Marco Alberto Javarone,
Shaurya Pratap Singh
Abstract:
Commonly, the strategy revision phase in evolutionary games relies on payoff comparison. Namely, agents compare their payoff with the opponent, assessing whether changing strategy can be potentially convenient. Even tiny payoff differences can be crucial in this decision process.
In this work, we study the dynamics of cooperation in the Public Goods Game, introducing a threshold $ε$ in the strat…
▽ More
Commonly, the strategy revision phase in evolutionary games relies on payoff comparison. Namely, agents compare their payoff with the opponent, assessing whether changing strategy can be potentially convenient. Even tiny payoff differences can be crucial in this decision process.
In this work, we study the dynamics of cooperation in the Public Goods Game, introducing a threshold $ε$ in the strategy revision phase. In doing so, payoff differences narrower than $ε$ entail the decision process reduces to a coin flip.
Interestingly, with ordinary agents, results show that payoff thresholds curb the emergence of cooperation. Yet, the latter can be sustained by these thresholds if the population is composed of conformist agents, which replace the random-based revision with selecting the strategy of the majority.
To conclude, agents sensible only to consistent payoff differences may represent 'real-world' individuals unable to properly appreciate advantages or disadvantages when facing a dilemma. These agents may be detrimental to the emergence of cooperation or, on the contrary, supportive when endowed with a conformist attitude.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Transformer Fusion with Optimal Transport
Authors:
Moritz Imfeld,
Jacopo Graldi,
Marco Giordano,
Thomas Hofmann,
Sotiris Anagnostidis,
Sidak Pal Singh
Abstract:
Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities. Past attempts have been restricted to the case of fully-connected, convolutional, and residual networks. This paper presents a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components.…
▽ More
Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities. Past attempts have been restricted to the case of fully-connected, convolutional, and residual networks. This paper presents a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components. We flesh out an abstraction for layer alignment, that can generalize to arbitrary architectures - in principle - and we apply this to the key ingredients of Transformers such as multi-head self-attention, layer-normalization, and residual connections, and we discuss how to handle them via various ablation studies. Furthermore, our method allows the fusion of models of different sizes (heterogeneous fusion), providing a new and efficient way to compress Transformers. The proposed approach is evaluated on both image classification tasks via Vision Transformer and natural language modeling tasks using BERT. Our approach consistently outperforms vanilla fusion, and, after a surprisingly short finetuning, also outperforms the individual converged parent models. In our analysis, we uncover intriguing insights about the significant role of soft alignment in the case of Transformers. Our results showcase the potential of fusing multiple Transformers, thus compounding their expertise, in the budding paradigm of model fusion and recombination. Code is available at https://github.com/graldij/transformer-fusion.
△ Less
Submitted 22 April, 2024; v1 submitted 9 October, 2023;
originally announced October 2023.
-
Towards guarantees for parameter isolation in continual learning
Authors:
Giulia Lanzillotta,
Sidak Pal Singh,
Benjamin F. Grewe,
Thomas Hofmann
Abstract:
Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catas…
▽ More
Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catastrophic forgetting are lacking. In this work, we study the relationship between learning and forgetting by looking at the geometry of neural networks' loss landscape. We offer a unifying perspective on a family of continual learning algorithms, namely methods based on parameter isolation, and we establish guarantees on catastrophic forgetting for some of them.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
On the curvature of the loss landscape
Authors:
Alison Pouplin,
Hrittik Roy,
Sidak Pal Singh,
Georgios Arvanitidis
Abstract:
One of the main challenges in modern deep learning is to understand why such over-parameterized models perform so well when trained on finite data. A way to analyze this generalization concept is through the properties of the associated loss landscape. In this work, we consider the loss landscape as an embedded Riemannian manifold and show that the differential geometric properties of the manifold…
▽ More
One of the main challenges in modern deep learning is to understand why such over-parameterized models perform so well when trained on finite data. A way to analyze this generalization concept is through the properties of the associated loss landscape. In this work, we consider the loss landscape as an embedded Riemannian manifold and show that the differential geometric properties of the manifold can be used when analyzing the generalization abilities of a deep net. In particular, we focus on the scalar curvature, which can be computed analytically for our manifold, and show connections to several settings that potentially imply generalization.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
The Hessian perspective into the Nature of Convolutional Neural Networks
Authors:
Sidak Pal Singh,
Thomas Hofmann,
Bernhard Schölkopf
Abstract:
While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps. The reason is that the loss Hessian captures the pairwise interaction of parameters and therefore forms a natural ground to probe how the architectural aspects of CNN get mani…
▽ More
While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps. The reason is that the loss Hessian captures the pairwise interaction of parameters and therefore forms a natural ground to probe how the architectural aspects of CNN get manifested in its structure and properties. We develop a framework relying on Toeplitz representation of CNNs, and then utilize it to reveal the Hessian structure and, in particular, its rank. We prove tight upper bounds (with linear activations), which closely follow the empirical trend of the Hessian rank and hold in practice in more general settings. Overall, our work generalizes and establishes the key insight that, even in CNNs, the Hessian rank grows as the square root of the number of parameters.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
OriCon3D: Effective 3D Object Detection using Orientation and Confidence
Authors:
Dhyey Manish Rajani,
Surya Pratap Singh,
Rahul Kashyap Swayampakula
Abstract:
In this paper, we propose an advanced methodology for the detection of 3D objects and precise estimation of their spatial positions from a single image. Unlike conventional frameworks that rely solely on center-point and dimension predictions, our research leverages a deep convolutional neural network-based 3D object weighted orientation regression paradigm. These estimates are then seamlessly int…
▽ More
In this paper, we propose an advanced methodology for the detection of 3D objects and precise estimation of their spatial positions from a single image. Unlike conventional frameworks that rely solely on center-point and dimension predictions, our research leverages a deep convolutional neural network-based 3D object weighted orientation regression paradigm. These estimates are then seamlessly integrated with geometric constraints obtained from a 2D bounding box, resulting in derivation of a comprehensive 3D bounding box. Our novel network design encompasses two key outputs. The first output involves the estimation of 3D object orientation through the utilization of a discrete-continuous loss function. Simultaneously, the second output predicts objectivity-based confidence scores with minimal variance. Additionally, we also introduce enhancements to our methodology through the incorporation of lightweight residual feature extractors. By combining the derived estimates with the geometric constraints inherent in the 2D bounding box, our approach significantly improves the accuracy of 3D object pose determination, surpassing baseline methodologies. Our method is rigorously evaluated on the KITTI 3D object detection benchmark, demonstrating superior performance.
△ Less
Submitted 3 January, 2024; v1 submitted 27 April, 2023;
originally announced April 2023.
-
Twilight SLAM: Navigating Low-Light Environments
Authors:
Surya Pratap Singh,
Billy Mazotti,
Dhyey Manish Rajani,
Sarvesh Mayilvahanan,
Guoyuan Li,
Maani Ghaffari
Abstract:
This paper presents a detailed examination of low-light visual Simultaneous Localization and Mapping (SLAM) pipelines, focusing on the integration of state-of-the-art (SOTA) low-light image enhancement algorithms with standard and contemporary SLAM frameworks. The primary objective of our work is to address a pivotal question: Does illuminating visual input significantly improve localization accur…
▽ More
This paper presents a detailed examination of low-light visual Simultaneous Localization and Mapping (SLAM) pipelines, focusing on the integration of state-of-the-art (SOTA) low-light image enhancement algorithms with standard and contemporary SLAM frameworks. The primary objective of our work is to address a pivotal question: Does illuminating visual input significantly improve localization accuracy in both semi-dark and dark environments? In contrast to previous works that primarily address partially dim-lit datasets, we comprehensively evaluate various low-light SLAM pipelines across obscurely-lit environments. Employing a meticulous experimental approach, we qualitatively and quantitatively assess different combinations of image enhancers and SLAM frameworks, identifying the best-performing combinations for feature-based visual SLAM. The findings advance low-light SLAM by highlighting the practical implications of enhancing visual input for improved localization accuracy in challenging lighting conditions. This paper also offers valuable insights, encouraging further exploration of visual enhancement strategies for enhanced SLAM performance in real-world scenarios.
△ Less
Submitted 24 December, 2023; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Leveraging Neo4j and deep learning for traffic congestion simulation & optimization
Authors:
Shyam Pratap Singh,
Arshad Ali Khan,
Riad Souissi,
Syed Adnan Yusuf
Abstract:
Traffic congestion has been a major challenge in many urban road networks. Extensive research studies have been conducted to highlight traffic-related congestion and address the issue using data-driven approaches. Currently, most traffic congestion analyses are done using simulation software that offers limited insight due to the limitations in the tools and utilities being used to render various…
▽ More
Traffic congestion has been a major challenge in many urban road networks. Extensive research studies have been conducted to highlight traffic-related congestion and address the issue using data-driven approaches. Currently, most traffic congestion analyses are done using simulation software that offers limited insight due to the limitations in the tools and utilities being used to render various traffic congestion scenarios. All that impacts the formulation of custom business problems which vary from place to place and country to country. By exploiting the power of the knowledge graph, we model a traffic congestion problem into the Neo4j graph and then use the load balancing, optimization algorithm to identify congestion-free road networks. We also show how traffic propagates backward in case of congestion or accident scenarios and its overall impact on other segments of the roads. We also train a sequential RNN-LSTM (Long Short-Term Memory) deep learning model on the real-time traffic data to assess the accuracy of simulation results based on a road-specific congestion. Our results show that graph-based traffic simulation, supplemented by AI ML-based traffic prediction can be more effective in estimating the congestion level in a road network.
△ Less
Submitted 9 December, 2023; v1 submitted 31 March, 2023;
originally announced April 2023.
-
Accelerating cosmological models in $f(Q)$ gravity and the phase space analysis
Authors:
S. A. Narawade,
Shashank P. Singh,
B. Mishra
Abstract:
The dynamical aspect of accelerating cosmological model has been studied in this paper in the context of modified symmetric teleparallel gravity, the $f(Q)$ gravity. Initially, we have derived the dynamical parameters for two well known forms of $f(Q)$ such as: (i) log-square-root form and (ii) exponential form. The equation of state (EoS) parameter for the dark energy in the $f(Q)$ gravity in bot…
▽ More
The dynamical aspect of accelerating cosmological model has been studied in this paper in the context of modified symmetric teleparallel gravity, the $f(Q)$ gravity. Initially, we have derived the dynamical parameters for two well known forms of $f(Q)$ such as: (i) log-square-root form and (ii) exponential form. The equation of state (EoS) parameter for the dark energy in the $f(Q)$ gravity in both the models emerges into a dynamical quantity. At present model-I shows the quintessence behavior and behave like the $Λ$CDM at the late time whereas model-II shows phantom behaviour. Further, the dynamical system analysis has been performed to determine the cosmological behaviour of the models along with its stability behaviour. For both the models the critical points are obtained and analysed the stability at each critical points with phase portraits. The evolutionary behaviour of density parameters for the matter-dominated, radiation-dominated, and dark energy phases are also shown for both the models.
△ Less
Submitted 17 July, 2023; v1 submitted 11 March, 2023;
originally announced March 2023.
-
Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
Authors:
Grigory Khromov,
Sidak Pal Singh
Abstract:
Lipschitz continuity is a crucial functional property of any predictive model, that naturally governs its robustness, generalisation, as well as adversarial vulnerability. Contrary to other works that focus on obtaining tighter bounds and developing different practical strategies to enforce certain Lipschitz properties, we aim to thoroughly examine and characterise the Lipschitz behaviour of Neura…
▽ More
Lipschitz continuity is a crucial functional property of any predictive model, that naturally governs its robustness, generalisation, as well as adversarial vulnerability. Contrary to other works that focus on obtaining tighter bounds and developing different practical strategies to enforce certain Lipschitz properties, we aim to thoroughly examine and characterise the Lipschitz behaviour of Neural Networks. Thus, we carry out an empirical investigation in a range of different settings (namely, architectures, datasets, label noise, and more) by exhausting the limits of the simplest and the most general lower and upper bounds. As a highlight of this investigation, we showcase a remarkable fidelity of the lower Lipschitz bound, identify a striking Double Descent trend in both upper and lower bounds to the Lipschitz and explain the intriguing effects of label noise on function smoothness and generalisation.
△ Less
Submitted 14 May, 2024; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Complex phase-fluctuation effects correlated with granularity in superconducting NbN nanofilms
Authors:
Meenakshi Sharma,
Manju Singh,
Rajib K. Rakshit,
Surinder P. Singh,
Matteo Fretto,
Natascia De Leo,
Andrea Perali,
Nicola Pinto
Abstract:
Superconducting nanofilms are tunable systems that can host a 3D-2D dimensional crossover, leading to the Berezinskii-Kosterlitz-Thouless (BKT) superconducting transition approaching the 2D regime. Reducing further the dimensionality, from 2D to quasi-1D, superconducting nanostructures with disorder can generate quantum and thermal phase slips (PS) of the order parameter. Both BKT and PS are compl…
▽ More
Superconducting nanofilms are tunable systems that can host a 3D-2D dimensional crossover, leading to the Berezinskii-Kosterlitz-Thouless (BKT) superconducting transition approaching the 2D regime. Reducing further the dimensionality, from 2D to quasi-1D, superconducting nanostructures with disorder can generate quantum and thermal phase slips (PS) of the order parameter. Both BKT and PS are complex phase fluctuation phenomena of difficult experimental detection. Here, we have characterized superconducting NbN nanofilms thinner than 15 nm, on different substrates, by temperature dependent resistivity and current-voltage (I-V) characteristics. Our measurements have evidenced clear features related to the emergence of BKT transition and PS events. The contemporary observation in the same system of BKT transition and PS events and their tunable evolution in temperature and thickness, has been explained as due to the nano-conducting paths forming in a granular NbN system. In one of the investigated samples we have been able to trace and characterize the continuous evolution in temperature from quantum to thermal PS. Our analysis has established that the detected complex phase phenomena are strongly related to the interplay between the typical size of the nano-conductive paths and the superconducting coherence length.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
Authors:
Elias Frantar,
Sidak Pal Singh,
Dan Alistarh
Abstract:
We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via…
▽ More
We consider the problem of model compression for deep neural networks (DNNs) in the challenging one-shot/post-training setting, in which we are given an accurate trained model, and must compress it without any retraining, based only on a small amount of calibration input data. This problem has become popular in view of the emerging software and hardware support for executing models compressed via pruning and/or quantization with speedup, and well-performing solutions have been proposed independently for both compression approaches. In this paper, we introduce a new compression framework which covers both weight pruning and quantization in a unified setting, is time- and space-efficient, and considerably improves upon the practical performance of existing post-training methods. At the technical level, our approach is based on an exact and efficient realization of the classical Optimal Brain Surgeon (OBS) framework of [LeCun, Denker, and Solla, 1990] extended to also cover weight quantization at the scale of modern DNNs. From the practical perspective, our experimental results show that it can improve significantly upon the compression-accuracy trade-offs of existing post-training methods, and that it can enable the accurate compound application of both pruning and quantization in a post-training setting.
△ Less
Submitted 8 January, 2023; v1 submitted 24 August, 2022;
originally announced August 2022.
-
Capacity Management in a Pandemic with Endogenous Patient Choices and Flows
Authors:
Sanyukta Deshpande,
Lavanya Marla,
Alan Scheller-Wolf,
Siddharth Prakash Singh
Abstract:
Motivated by the experiences of a healthcare service provider during the Covid-19 pandemic, we aim to study the decisions of a provider that operates both an Emergency Department (ED) and a medical Clinic. Patients contact the provider through a phone call or may present directly at the ED: patients can be COVID (suspected/confirmed) or non-COVID, and have different severities. Depending on the se…
▽ More
Motivated by the experiences of a healthcare service provider during the Covid-19 pandemic, we aim to study the decisions of a provider that operates both an Emergency Department (ED) and a medical Clinic. Patients contact the provider through a phone call or may present directly at the ED: patients can be COVID (suspected/confirmed) or non-COVID, and have different severities. Depending on the severity, patients who contact the provider may be directed to the ED (to be seen in a few hours), be offered an appointment at the Clinic (to be seen in a few days), or be treated via phone or telemedicine, avoiding a visit to a facility. All patients make joining decisions based on comparing their own risk perceptions versus their anticipated benefits: They then choose to enter a facility only if it is beneficial enough. Also, after initial contact, their severities may evolve, which may change their decision. The hospital system's objective is to allocate service capacity across facilities so as to minimize costs from patient deaths or defections. We model the system using a fluid approximation over multiple periods, possibly with different demand profiles. While the feasible space for this problem can be extremely complex, it is amenable to decomposition into different sub-regions that can be analyzed individually, the global optimal solution can be reached via provably parsimonious computational methods over a single period and over multiple periods with different demand rates. Our analytical and computational results indicate that endogeneity results in non-trivial and non-intuitive capacity allocations that do not always prioritize high severity patients, for both single and multi-period settings.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse
Authors:
Lorenzo Noci,
Sotiris Anagnostidis,
Luca Biggio,
Antonio Orvieto,
Sidak Pal Singh,
Aurelien Lucchi
Abstract:
Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers - the distinctive architectural component of Transformers - can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training…
▽ More
Transformers have achieved remarkable success in several domains, ranging from natural language processing to computer vision. Nevertheless, it has been recently shown that stacking self-attention layers - the distinctive architectural component of Transformers - can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training is still largely unanswered, and its investigation is necessary for a more comprehensive understanding of this architecture. In this work, we shed new light on the causes and the effects of this phenomenon. First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization. Furthermore, we provide a thorough description of the origin of rank collapse and discuss how to prevent it via an appropriate depth-dependent scaling of the residual branches. Finally, our analysis unveils that specific architectural hyperparameters affect the gradients of queries and values differently, leading to disproportionate gradient norms. This suggests an explanation for the widespread use of adaptive methods for Transformers' optimization.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Compression of a confined semiflexible polymer under direct and oscillating fields
Authors:
Keerthi Radhakrishnan,
Sunil P. Singh
Abstract:
The folding transition of biopolymers from the coil to compact structures has attracted wide research interest in the past and is well studied in polymer physics. Recent seminal works on DNA in confined devices have shown that these long biopolymers tend to collapse under an external field, contrary to the previously reported stretching. These long folded structures have a tendency to form knots t…
▽ More
The folding transition of biopolymers from the coil to compact structures has attracted wide research interest in the past and is well studied in polymer physics. Recent seminal works on DNA in confined devices have shown that these long biopolymers tend to collapse under an external field, contrary to the previously reported stretching. These long folded structures have a tendency to form knots that has profound implications in gene regulation and various other biological functions. These knots have been mechanically induced via optical tweezers, nanochannel confinement, etc., until recently, where uniform field driven compression lead to self entanglement of DNA. In this work, we capture the compression of a confined semiflexible polymer under direct and oscillating fields, using a coarse-grained computer simulation model in the presence of long-range hydrodynamics. Within this framework, we show that subjected to direct field, chains in stronger confinements exhibit substantial compaction, contrary to the one in moderate confinements or bulk, where such compaction is absent. Interestingly, an alternating field within an optimum frequency can effectuate this compression even in moderate or no confinement. Additionally, we show that the bending rigidity has a profound influence on the chains folding favourability under direct and alternating fields. This field induced collapse is a quintessential hydrodynamic phenomenon, resulting in intertwined knotted structures, even for shorter chains, unlike DNA knotting experiments, where it happens exclusively for longer chains.
△ Less
Submitted 11 April, 2022;
originally announced April 2022.
-
Phenomenology of Double Descent in Finite-Width Neural Networks
Authors:
Sidak Pal Singh,
Aurelien Lucchi,
Thomas Hofmann,
Bernhard Schölkopf
Abstract:
`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture…
▽ More
`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture the mechanisms behind double descent in finite-width neural networks, as well as, disregard crucial components -- such as the choice of the loss function. We address these shortcomings by leveraging influence functions in order to derive suitable expressions of the population loss and its lower bound, while imposing minimal assumptions on the form of the parametric model. Our derived bounds bear an intimate connection with the spectrum of the Hessian at the optimum, and importantly, exhibit a double descent behaviour at the interpolation threshold. Building on our analysis, we further investigate how the loss function affects double descent -- and thus uncover interesting properties of neural networks and their Hessian spectra near the interpolation threshold.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
A Deep Learning Approach for the Detection of COVID-19 from Chest X-Ray Images using Convolutional Neural Networks
Authors:
Aditya Saxena,
Shamsheer Pal Singh
Abstract:
The COVID-19 (coronavirus) is an ongoing pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus was first identified in mid-December 2019 in the Hubei province of Wuhan, China and by now has spread throughout the planet with more than 75.5 million confirmed cases and more than 1.67 million deaths. With limited number of COVID-19 test kits available in medical fa…
▽ More
The COVID-19 (coronavirus) is an ongoing pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The virus was first identified in mid-December 2019 in the Hubei province of Wuhan, China and by now has spread throughout the planet with more than 75.5 million confirmed cases and more than 1.67 million deaths. With limited number of COVID-19 test kits available in medical facilities, it is important to develop and implement an automatic detection system as an alternative diagnosis option for COVID-19 detection that can used on a commercial scale. Chest X-ray is the first imaging technique that plays an important role in the diagnosis of COVID-19 disease. Computer vision and deep learning techniques can help in determining COVID-19 virus with Chest X-ray Images. Due to the high availability of large-scale annotated image datasets, great success has been achieved using convolutional neural network for image analysis and classification. In this research, we have proposed a deep convolutional neural network trained on five open access datasets with binary output: Normal and Covid. The performance of the model is compared with four pre-trained convolutional neural network-based models (COVID-Net, ResNet18, ResNet and MobileNet-V2) and it has been seen that the proposed model provides better accuracy on the validation set as compared to the other four pre-trained models. This research work provides promising results which can be further improvise and implement on a commercial scale.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Soft Actor-Critic with Cross-Entropy Policy Optimization
Authors:
Zhenyang Shi,
Surya P. N. Singh
Abstract:
Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinforcement learning (RL) algorithms that is within the maximum entropy based RL framework. SAC is demonstrated to perform very well in a list of continous control tasks with good stability and robustness. SAC learns a stochastic Gaussian policy that can maximize a trade-off between total expected reward and the policy entropy. To…
▽ More
Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinforcement learning (RL) algorithms that is within the maximum entropy based RL framework. SAC is demonstrated to perform very well in a list of continous control tasks with good stability and robustness. SAC learns a stochastic Gaussian policy that can maximize a trade-off between total expected reward and the policy entropy. To update the policy, SAC minimizes the KL-Divergence between the current policy density and the soft value function density. Reparameterization trick is then used to obtain the approximate gradient of this divergence. In this paper, we propose Soft Actor-Critic with Cross-Entropy Policy Optimization (SAC-CEPO), which uses Cross-Entropy Method (CEM) to optimize the policy network of SAC. The initial idea is to use CEM to iteratively sample the closest distribution towards the soft value function density and uses the resultant distribution as a target to update the policy network. For the purpose of reducing the computational complexity, we also introduce a decoupled policy structure that decouples the Gaussian policy into one policy that learns the mean and one other policy that learns the deviation such that only the mean policy is trained by CEM. We show that this decoupled policy structure does converge to a optimal and we also demonstrate by experiments that SAC-CEPO achieves competitive performance against the original SAC.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
The explicit characterization of counterion dynamics around a flexible polyelectrolyte
Authors:
Keerthi Radhakrishnan,
Sunil P. Singh
Abstract:
The article presents a comprehensive study of counterion dynamics around a generic linear polyelectrolyte (PE) chain with the help of coarse-grained computer simulations. The ion-chain coupling is discussed in the form of binding time, mean-square-displacement (MSD) relative to the chain, local ion transport coefficient, and spatio-temporal correlations in the effective charge. We have shown that…
▽ More
The article presents a comprehensive study of counterion dynamics around a generic linear polyelectrolyte (PE) chain with the help of coarse-grained computer simulations. The ion-chain coupling is discussed in the form of binding time, mean-square-displacement (MSD) relative to the chain, local ion transport coefficient, and spatio-temporal correlations in the effective charge. We have shown that a counterion exhibits sub-diffusive behavior $\langle δR^2 \rangle \sim t^δ$, $δ\approx0.9$ w.r.t. chain's centre of mass (COM). The MSD of ions perpendicularly outwards from the chain segment exhibits a smaller sub-diffusive exponent compared to the one relative to the chain's COM. Further, we confirm that the effective diffusion-coefficient of counterions is strongly coupled with the chain. The effective diffusivity of ion is the lowest in chain's close proximity, extending up to length-scale of the radius of gyration Rg. Beyond Rg at larger distances, they attain diffusivity of free ion with a smooth cross-over from the adsorbed regime to the free ion regime. We have shown that the effective diffusivity drastically decreases for the higher valent ions, while the crossover length scale remains the same. Conversely, with increasing salt concentration the coupling-length scale reduces, while the diffusivity remains unaltered. The effective diffusivity of adsorbed-ion reveals an exponential reduction with electrostatic interaction strength. We further corroborate this from the binding time of ions on the chain, which also grows exponentially with the coupling strength of the ion-polymer duo. Moreover, the binding time of ions exhibits a weak dependence with salt concentration for the monovalent salt, while for higher valent salts the binding time decreases dramatically with concentration.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
The CAT SET on the MAT: Cross Attention for Set Matching in Bipartite Hypergraphs
Authors:
Govind Sharma,
Swyam Prakash Singh,
V. Susheela Devi,
M. Narasimha Murty
Abstract:
Usual relations between entities could be captured using graphs; but those of a higher-order -- more so between two different types of entities (which we term "left" and "right") -- calls for a "bipartite hypergraph". For example, given a left set of symptoms and right set of diseases, the relation between a set subset of symptoms (that a patient experiences at a given point of time) and a subset…
▽ More
Usual relations between entities could be captured using graphs; but those of a higher-order -- more so between two different types of entities (which we term "left" and "right") -- calls for a "bipartite hypergraph". For example, given a left set of symptoms and right set of diseases, the relation between a set subset of symptoms (that a patient experiences at a given point of time) and a subset of diseases (that he/she might be diagnosed with) could be well-represented using a bipartite hyperedge. The state-of-the-art in embedding nodes of a hypergraph is based on learning the self-attention structure between node-pairs from a hyperedge. In the present work, given a bipartite hypergraph, we aim at capturing relations between node pairs from the cross-product between the left and right hyperedges, and term it a "cross-attention" (CAT) based model. More precisely, we pose "bipartite hyperedge link prediction" as a set-matching (SETMAT) problem and propose a novel neural network architecture called CATSETMAT for the same. We perform extensive experiments on multiple bipartite hypergraph datasets to show the superior performance of CATSETMAT, which we compare with multiple techniques from the state-of-the-art. Our results also elucidate information flow in self- and cross-attention scenarios.
△ Less
Submitted 30 October, 2021;
originally announced November 2021.
-
Searching a systematics for nonfactorizable contribution to B-and B0 mesons
Authors:
Maninder Kaur,
Supreet Pal Singh,
R. C. Verma
Abstract:
Two-body weak decays / and are examined under isospin analysis to study nonfactorizable contributions. After extracting the strong phases and obtaining the factorizable contributions from spectator-quark diagrams for Nc=3, we determine nonfactorizable isospin amplitudes from the experimental data for these modes. Our results support the universality of ratio of nonfactorizable isospin reduced ampl…
▽ More
Two-body weak decays / and are examined under isospin analysis to study nonfactorizable contributions. After extracting the strong phases and obtaining the factorizable contributions from spectator-quark diagrams for Nc=3, we determine nonfactorizable isospin amplitudes from the experimental data for these modes. Our results support the universality of ratio of nonfactorizable isospin reduced amplitudes for these decays within experimental errors. In order to show that this systematics is not coincidental, we also plot our results w. r. t. this ratio.
△ Less
Submitted 9 October, 2021; v1 submitted 6 August, 2021;
originally announced August 2021.
-
Analytic Insights into Structure and Rank of Neural Network Hessian Maps
Authors:
Sidak Pal Singh,
Gregor Bachmann,
Thomas Hofmann
Abstract:
The Hessian of a neural network captures parameter interactions through second-order derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on low-rank approximations and heuristics that are blind to the network structure. In con…
▽ More
The Hessian of a neural network captures parameter interactions through second-order derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on low-rank approximations and heuristics that are blind to the network structure. In contrast, we develop theoretical tools to analyze the range of the Hessian map, providing us with a precise understanding of its rank deficiency as well as the structural reasons behind it. This yields exact formulas and tight upper bounds for the Hessian rank of deep linear networks, allowing for an elegant interpretation in terms of rank deficiency. Moreover, we demonstrate that our bounds remain faithful as an estimate of the numerical Hessian rank, for a larger class of models such as rectified and hyperbolic tangent networks. Further, we also investigate the implications of model architecture (e.g.~width, depth, bias) on the rank deficiency. Overall, our work provides novel insights into the source and extent of redundancy in overparameterized networks.
△ Less
Submitted 1 July, 2021; v1 submitted 30 June, 2021;
originally announced June 2021.
-
LiMIIRL: Lightweight Multiple-Intent Inverse Reinforcement Learning
Authors:
Aaron J. Snoswell,
Surya P. N. Singh,
Nan Ye
Abstract:
Multiple-Intent Inverse Reinforcement Learning (MI-IRL) seeks to find a reward function ensemble to rationalize demonstrations of different but unlabelled intents. Within the popular expectation maximization (EM) framework for learning probabilistic MI-IRL models, we present a warm-start strategy based on up-front clustering of the demonstrations in feature space. Our theoretical analysis shows th…
▽ More
Multiple-Intent Inverse Reinforcement Learning (MI-IRL) seeks to find a reward function ensemble to rationalize demonstrations of different but unlabelled intents. Within the popular expectation maximization (EM) framework for learning probabilistic MI-IRL models, we present a warm-start strategy based on up-front clustering of the demonstrations in feature space. Our theoretical analysis shows that this warm-start solution produces a near-optimal reward ensemble, provided the behavior modes satisfy mild separation conditions. We also propose a MI-IRL performance metric that generalizes the popular Expected Value Difference measure to directly assesses learned rewards against the ground-truth reward ensemble. Our metric elegantly addresses the difficulty of pairing up learned and ground truth rewards via a min-cost flow formulation, and is efficiently computable. We also develop a MI-IRL benchmark problem that allows for more comprehensive algorithmic evaluations. On this problem, we find our MI-IRL warm-start strategy helps avoid poor quality local minima reward ensembles, resulting in a significant improvement in behavior clustering. Our extensive sensitivity analysis demonstrates that the quality of the learned reward ensembles is improved under various settings, including cases where our theoretical assumptions do not necessarily hold. Finally, we demonstrate the effectiveness of our methods by discovering distinct driving styles in a large real-world dataset of driver GPS trajectories.
△ Less
Submitted 3 June, 2021;
originally announced June 2021.