Half-Space Feature Learning in Neural Networks
Authors:
Mahesh Lorik Yadav,
Harish Guruprasad Ramaswamy,
Chandrashekar Lakshminarayanan
Abstract:
There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural netwo…
▽ More
There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural networks can be viewed as a mixture of experts, where each expert corresponds to a (number of layers length) path through a sequence of hidden units. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN), which sits midway between deep linear networks and ReLU networks. Unlike deep linear networks, the DLGN is capable of learning non-linear features (which are then linearly combined), and unlike ReLU networks these features are ultimately simple -- each feature is effectively an indicator function for a region compactly described as an intersection of (number of layers) half-spaces in the input space. This viewpoint allows for a comprehensive global visualization of features, unlike the local visualizations for neurons based on saliency/activation/gradient maps. Feature learning in DLGNs is shown to happen and the mechanism with which this happens is through learning half-spaces in the input space that contain smooth regions of the target function. Due to the structure of DLGNs, the neurons in later layers are fundamentally the same as those in earlier layers -- they all represent a half-space -- however, the dynamics of gradient descent impart a distinct clustering to the later layer neurons. We hypothesize that ReLU networks also have similar feature learning behaviour.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
Exhaustive study of three-time periods of solar activity due to single active regions: sunspot, flare, CME, and, geo-effective characteristics
Authors:
Shirsh Lata Soni,
Manohar Lal Yadav,
Radhe Syam Gupta,
Pyare lal Verma
Abstract:
In this paper, we present the multi-wavelength study of a high level of solar activity during which a single active region produced multiple flares/CMEs. According to the sunspot observations, the current solar cycle 24 manifest to be less intense in comparison with the previous recent sunspot cycles. In the course of the current sunspot cycle 24, several small and large sunspot groups have produc…
▽ More
In this paper, we present the multi-wavelength study of a high level of solar activity during which a single active region produced multiple flares/CMEs. According to the sunspot observations, the current solar cycle 24 manifest to be less intense in comparison with the previous recent sunspot cycles. In the course of the current sunspot cycle 24, several small and large sunspot groups have produced various moderate and intense flare/CME events. There are a few active regions with a large number of flaring activities passed across the visible disk of the Sun during 2012-2015. In this study, we consider the three periods 22-29 Oct 2013, 01-08 Nov 2013, and 25 Oct- 08-Nov 2014, during which 228 flares have been observed. Considering only active regions near the central part of the disk, 59 CMEs (halo or partial) have been reported among which only 39 events are associated with flares. We conclude that an active region with a larger area, more complex morphology and stronger magnetic field has a comparatively higher possibility of producing extremely fast CMEs (speed > 1500 km/sec). So that among the 5 X class flares of the reported periods, 3 of them (60%) are associated with a CME. The lift-off time for CME-flare associated events has a +15 to+30 minute time interval range after the occurrence time of associated flares suggesting that the flares produce the CMEs. Additionally, we compiled the geomagnetic storms occurring within1-5 days after the CME onset. 10% of the 59 CMEs are related to a magnetic storm but all are moderate storms.
△ Less
Submitted 8 December, 2020;
originally announced December 2020.