Search | arXiv e-print repository

Hyperspectral Image Land Cover Captioning Dataset for Vision Language Models

Authors: Aryan Das, Tanishq Rachamalla, Pravendra Singh, Koushik Biswas, Vinay Kumar Verma, Swalpa Kumar Roy

Abstract: We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery.… ▽ More We introduce HyperCap, the first large-scale hyperspectral captioning dataset designed to enhance model performance and effectiveness in remote sensing applications. Unlike traditional hyperspectral imaging (HSI) datasets that focus solely on classification tasks, HyperCap integrates spectral data with pixel-wise textual annotations, enabling deeper semantic understanding of hyperspectral imagery. This dataset enhances model performance in tasks like classification and feature extraction, providing a valuable resource for advanced remote sensing applications. HyperCap is constructed from four benchmark datasets and annotated through a hybrid approach combining automated and manual methods to ensure accuracy and consistency. Empirical evaluations using state-of-the-art encoders and diverse fusion techniques demonstrate significant improvements in classification performance. These results underscore the potential of vision-language learning in HSI and position HyperCap as a foundational dataset for future research in the field. △ Less

Submitted 17 May, 2025; originally announced May 2025.

arXiv:2503.19362 [pdf, ps, other]

I-C-Q relations for rapidly rotating stable hybrid stars

Authors: Sujan Kumar Roy, Gargi Chaudhuri

Abstract: A number of hadronic equations of state for neutron stars have been investigated for the purpose of the present paper, considering the fact that at sufficiently high density, heavy baryons and quark phases may appear. The observational limits from NICER, GW170817, etc., are obeyed by our choice of equations of state. The universal relations are investigated for both slowly and rapidly rotating neu… ▽ More A number of hadronic equations of state for neutron stars have been investigated for the purpose of the present paper, considering the fact that at sufficiently high density, heavy baryons and quark phases may appear. The observational limits from NICER, GW170817, etc., are obeyed by our choice of equations of state. The universal relations are investigated for both slowly and rapidly rotating neutron stars with heavy baryons present inside the core. For slowly rotating stars, the universality of the I-Love-Q relations is verified, and the I-C-Q relations are inferred to be universal for rapidly rotating stars. Further, we extend the investigation to obtain the universal relations for compact stars containing the quark core, where the connected stable branch of such hybrid stars is considered. The parameters of the I-Love-Q and I-C-Q universal relations are obtained for slowly rotating and rapidly rotating hybrid stars, respectively. These relations would enable extracting information, within the context of general relativity, from astrophysical systems involving rapidly rotating neutron stars. △ Less

Submitted 25 March, 2025; originally announced March 2025.

Comments: Accepted for publication in Astroparticle Physics

arXiv:2503.03042 [pdf, other]

Learning from Noisy Labels with Contrastive Co-Transformer

Authors: Yan Han, Soumava Kumar Roy, Mehrtash Harandi, Lars Petersson

Abstract: Deep learning with noisy labels is an interesting challenge in weakly supervised learning. Despite their significant learning capacity, CNNs have a tendency to overfit in the presence of samples with noisy labels. Alleviating this issue, the well known Co-Training framework is used as a fundamental basis for our work. In this paper, we introduce a Contrastive Co-Transformer framework, which is sim… ▽ More Deep learning with noisy labels is an interesting challenge in weakly supervised learning. Despite their significant learning capacity, CNNs have a tendency to overfit in the presence of samples with noisy labels. Alleviating this issue, the well known Co-Training framework is used as a fundamental basis for our work. In this paper, we introduce a Contrastive Co-Transformer framework, which is simple and fast, yet able to improve the performance by a large margin compared to the state-of-the-art approaches. We argue the robustness of transformers when dealing with label noise. Our Contrastive Co-Transformer approach is able to utilize all samples in the dataset, irrespective of whether they are clean or noisy. Transformers are trained by a combination of contrastive loss and classification loss. Extensive experimental results on corrupted data from six standard benchmark datasets including Clothing1M, demonstrate that our Contrastive Co-Transformer is superior to existing state-of-the-art methods. △ Less

Submitted 4 March, 2025; originally announced March 2025.

arXiv:2501.14302 [pdf, other]

TD-RD: A Top-Down Benchmark with Real-Time Framework for Road Damage Detection

Authors: Xi Xiao, Zhengji Li, Wentao Wang, Jiacheng Xie, Houjie Lin, Swalpa Kumar Roy, Tianyang Wang, Min Xu

Abstract: Object detection has witnessed remarkable advancements over the past decade, largely driven by breakthroughs in deep learning and the proliferation of large scale datasets. However, the domain of road damage detection remains relatively under explored, despite its critical significance for applications such as infrastructure maintenance and road safety. This paper addresses this gap by introducing… ▽ More Object detection has witnessed remarkable advancements over the past decade, largely driven by breakthroughs in deep learning and the proliferation of large scale datasets. However, the domain of road damage detection remains relatively under explored, despite its critical significance for applications such as infrastructure maintenance and road safety. This paper addresses this gap by introducing a novel top down benchmark that offers a complementary perspective to existing datasets, specifically tailored for road damage detection. Our proposed Top Down Road Damage Detection Dataset (TDRD) includes three primary categories of road damage cracks, potholes, and patches captured from a top down viewpoint. The dataset consists of 7,088 high resolution images, encompassing 12,882 annotated instances of road damage. Additionally, we present a novel real time object detection framework, TDYOLOV10, designed to handle the unique challenges posed by the TDRD dataset. Comparative studies with state of the art models demonstrate competitive baseline results. By releasing TDRD, we aim to accelerate research in this crucial area. A sample of the dataset will be made publicly available upon the paper's acceptance. △ Less

Submitted 24 January, 2025; originally announced January 2025.

arXiv:2501.01495 [pdf, other]

doi 10.3847/1538-4357/adb3a0

Search for continuous gravitational waves from known pulsars in the first part of the fourth LIGO-Virgo-KAGRA observing run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné , et al. (1794 additional authors not shown)

Abstract: Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent ana… ▽ More Continuous gravitational waves (CWs) emission from neutron stars carries information about their internal structure and equation of state, and it can provide tests of General Relativity. We present a search for CWs from a set of 45 known pulsars in the first part of the fourth LIGO--Virgo--KAGRA observing run, known as O4a. We conducted a targeted search for each pulsar using three independent analysis methods considering the single-harmonic and the dual-harmonic emission models. We find no evidence of a CW signal in O4a data for both models and set upper limits on the signal amplitude and on the ellipticity, which quantifies the asymmetry in the neutron star mass distribution. For the single-harmonic emission model, 29 targets have the upper limit on the amplitude below the theoretical spin-down limit. The lowest upper limit on the amplitude is $6.4\!\times\!10^{-27}$ for the young energetic pulsar J0537-6910, while the lowest constraint on the ellipticity is $8.8\!\times\!10^{-9}$ for the bright nearby millisecond pulsar J0437-4715. Additionally, for a subset of 16 targets we performed a narrowband search that is more robust regarding the emission model, with no evidence of a signal. We also found no evidence of non-standard polarizations as predicted by the Brans-Dicke theory. △ Less

Submitted 2 January, 2025; originally announced January 2025.

Comments: main paper: 12 pages, 6 figures, 4 tables

Report number: LIGO-P2400315

Journal ref: Astrophys.J. 983 (2025) 2, 99

arXiv:2411.03048 [pdf, other]

UNet: A Generic and Reliable Multi-UAV Communication and Networking Architecture for Heterogeneous Applications

Authors: Sanku Kumar Roy, Mohamed Samshad, Ketan Rajawat

Abstract: The rapid growth of UAV applications necessitates a robust communication and networking architecture capable of addressing the diverse requirements of various applications concurrently, rather than relying on application-specific solutions. This paper proposes a generic and reliable multi-UAV communication and networking architecture designed to support the varying demands of heterogeneous applica… ▽ More The rapid growth of UAV applications necessitates a robust communication and networking architecture capable of addressing the diverse requirements of various applications concurrently, rather than relying on application-specific solutions. This paper proposes a generic and reliable multi-UAV communication and networking architecture designed to support the varying demands of heterogeneous applications, including short-range and long-range communication, star and mesh topologies, different data rates, and multiple wireless standards. Our architecture accommodates both adhoc and infrastructure networks, ensuring seamless connectivity throughout the network. Additionally, we present the design of a multi-protocol UAV gateway that enables interoperability among various communication protocols. Furthermore, we introduce a data processing and service layer framework with a graphical user interface of a ground control station that facilitates remote control and monitoring from any location at any time. We practically implemented the proposed architecture and evaluated its performance using different metrics, demonstrating its effectiveness. △ Less

Submitted 5 November, 2024; originally announced November 2024.

Comments: 11 pages, 20 figures, Journal paper

arXiv:2411.02494 [pdf, other]

doi 10.3847/2041-8213/add34a

Cosmology with Binary Neutron Stars: Does Mass-Redshift Correlation Matter?

Authors: Soumendra Kishore Roy, Lieke A. C. van Son, Anarya Ray, Will M. Farr

Abstract: Next-generation gravitational wave detectors are expected to detect millions of compact binary mergers across cosmological distances. The features of the mass distribution of these mergers, combined with gravitational wave distance measurements, will enable precise cosmological inferences, even without the need for electromagnetic counterparts. However, achieving accurate results requires modeling… ▽ More Next-generation gravitational wave detectors are expected to detect millions of compact binary mergers across cosmological distances. The features of the mass distribution of these mergers, combined with gravitational wave distance measurements, will enable precise cosmological inferences, even without the need for electromagnetic counterparts. However, achieving accurate results requires modeling the mass spectrum, particularly considering possible redshift evolution. Binary neutron star (BNS) mergers are thought to be less influenced by changes in metallicity compared to binary black holes (BBH) or neutron star-black hole (NSBH) mergers. This stability in their mass spectrum over cosmic time reduces the chances of introducing biases in cosmological parameters caused by redshift evolution. In this study, we use the population synthesis code COMPAS to generate astrophysically motivated catalogs of BNS mergers and explore whether assuming a non-evolving BNS mass distribution with redshift could introduce biases in cosmological parameter inference. Our findings show that despite significant variations in the BNS mass distribution across binary physics assumptions and initial conditions in COMPAS, the joint mass-redshift population can be expressed as the product of the mass distribution marginalized over redshift and the redshift distribution marginalized over masses. This enables a 2% unbiased constraint on the Hubble constant-sufficient to address the Hubble tension. Additionally, we show that in the fiducial COMPAS setup, the bias from a non-evolving BNS mass model is less than 0.5% for the Hubble parameter measured at redshift 0.4. These results establish BNS mergers as strong candidates for spectral siren cosmology in the era of next-generation gravitational wave detectors. △ Less

Submitted 28 May, 2025; v1 submitted 4 November, 2024; originally announced November 2024.

Comments: The ApJL accepted version, 17 pages, 5 figures. Associated Zenodo link: https://zenodo.org/records/14704635 and GitHub link: https://github.com/SoumendraRoy/RedevolBNS

Report number: LIGO document number LIGO-P2400446

Journal ref: The Astrophysical Journal Letters, Volume 985, Number 2 (2025) Pages L33

arXiv:2411.02484 [pdf, other]

Not just winds: why models find binary black hole formation is metallicity dependent, while binary neutron star formation is not

Authors: L. A. C. van Son, S. K. Roy, I. Mandel, W. M. Farr, A. Lam, J. Merritt, F. S. Broekgaarden, A. Sander, J. J. Andrews

Abstract: Both detailed and rapid population studies alike predict that binary black hole (BHBH) formation is orders of magnitude more efficient at low metallicity than high metallicity, while binary neutron star (NSNS) formation remains mostly flat with metallicity, and black hole-neutron star (BHNS) mergers show intermediate behavior. This finding is a key input to employ double compact objects as tracers… ▽ More Both detailed and rapid population studies alike predict that binary black hole (BHBH) formation is orders of magnitude more efficient at low metallicity than high metallicity, while binary neutron star (NSNS) formation remains mostly flat with metallicity, and black hole-neutron star (BHNS) mergers show intermediate behavior. This finding is a key input to employ double compact objects as tracers of low-metallicity star formation, as spectral sirens, and for merger rate calculations. Yet, the literature offers various (sometimes contradicting) explanations for these trends. We investigate the dominant cause for the metallicity dependence of double compact object formation. We find that the BHBH formation efficiency at low metallicity is set by initial condition distributions, and conventional simulations suggest that about \textit{one in eight interacting binary systems} with sufficient mass to form black holes will lead to a merging BHBH. We further find that the significance of metallicities in double compact object formation is a question of formation channel. The stable mass transfer and chemically homogeneous evolution channels mainly diminish at high metallicities due to changes in stellar radii, while the common envelope channel is primarily impacted by the combined effects of stellar winds and mass-scaled natal kicks. Outdated giant wind prescriptions exacerbate the latter effect, suggesting BHBH formation may be much less metallicity dependent than previously assumed. NSNS formation efficiency remains metallicity independent as they form exclusively through the common envelope channel, with natal kicks that are assumed uncorrelated with mass. Forthcoming GW observations will provide valuable constraints on these findings. △ Less

Submitted 12 December, 2024; v1 submitted 4 November, 2024; originally announced November 2024.

Comments: 15 pages, 8 Figures, Submitted to ApJ, Scripts and data to reproduce this work are at https://github.com/LiekeVanSon/ZdependentFormEff , and https://zenodo.org/records/13999532

arXiv:2410.16565 [pdf, other]

doi 10.3847/1538-4357/adc681

Search for gravitational waves emitted from SN 2023ixf

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné, A. Allocca , et al. (1758 additional authors not shown)

Abstract: We present the results of a search for gravitational-wave transients associated with core-collapse supernova SN 2023ixf, which was observed in the galaxy Messier 101 via optical emission on 2023 May 19th, during the LIGO-Virgo-KAGRA 15th Engineering Run. We define a five-day on-source window during which an accompanying gravitational-wave signal may have occurred. No gravitational waves have been… ▽ More We present the results of a search for gravitational-wave transients associated with core-collapse supernova SN 2023ixf, which was observed in the galaxy Messier 101 via optical emission on 2023 May 19th, during the LIGO-Virgo-KAGRA 15th Engineering Run. We define a five-day on-source window during which an accompanying gravitational-wave signal may have occurred. No gravitational waves have been identified in data when at least two gravitational-wave observatories were operating, which covered $\sim 14\%$ of this five-day window. We report the search detection efficiency for various possible gravitational-wave emission models. Considering the distance to M101 (6.7 Mpc), we derive constraints on the gravitational-wave emission mechanism of core-collapse supernovae across a broad frequency spectrum, ranging from 50 Hz to 2 kHz where we assume the gravitational-wave emission occurred when coincident data are available in the on-source window. Considering an ellipsoid model for a rotating proto-neutron star, our search is sensitive to gravitational-wave energy $1 \times 10^{-4} M_{\odot} c^2$ and luminosity $2.6 \times 10^{-4} M_{\odot} c^2/s$ for a source emitting at 82 Hz. These constraints are around an order of magnitude more stringent than those obtained so far with gravitational-wave data. The constraint on the ellipticity of the proto-neutron star that is formed is as low as 1.08, at frequencies above 1200 Hz, surpassing past results. △ Less

Submitted 11 March, 2025; v1 submitted 21 October, 2024; originally announced October 2024.

Comments: Main paper: 6 pages, 4 figures and 1 table. Total with appendices: 20 pages, 4 figures, and 1 table

Report number: LIGO-P2400125

Journal ref: ApJ 985 183 (2025)

arXiv:2410.09151 [pdf, other]

doi 10.3847/1538-4357/ad8de0

A search using GEO600 for gravitational waves coincident with fast radio bursts from SGR 1935+2154

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah, C. Alléné , et al. (1758 additional authors not shown)

Abstract: The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by… ▽ More The magnetar SGR 1935+2154 is the only known Galactic source of fast radio bursts (FRBs). FRBs from SGR 1935+2154 were first detected by CHIME/FRB and STARE2 in 2020 April, after the conclusion of the LIGO, Virgo, and KAGRA Collaborations' O3 observing run. Here we analyze four periods of gravitational wave (GW) data from the GEO600 detector coincident with four periods of FRB activity detected by CHIME/FRB, as well as X-ray glitches and X-ray bursts detected by NICER and NuSTAR close to the time of one of the FRBs. We do not detect any significant GW emission from any of the events. Instead, using a short-duration GW search (for bursts $\leq$ 1 s) we derive 50\% (90\%) upper limits of $10^{48}$ ($10^{49}$) erg for GWs at 300 Hz and $10^{49}$ ($10^{50}$) erg at 2 kHz, and constrain the GW-to-radio energy ratio to $\leq 10^{14} - 10^{16}$. We also derive upper limits from a long-duration search for bursts with durations between 1 and 10 s. These represent the strictest upper limits on concurrent GW emission from FRBs. △ Less

Submitted 21 May, 2025; v1 submitted 11 October, 2024; originally announced October 2024.

Comments: 15 pages of text including references, 4 figures, 5 tables

Report number: LIGO-P2400192

Journal ref: ApJ 977 255 (2024)

arXiv:2408.01372 [pdf, other]

doi 10.1016/j.neucom.2025.129995

Spatial and Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification

Authors: Muhammad Ahmad, Muhammad Hassaan Farooq Butt, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Muhammad Usama, Swalpa Kumar Roy, Jocelyn Chanussot, Danfeng Hong

Abstract: Recent advancements in transformers, specifically self-attention mechanisms, have significantly improved hyperspectral image (HSI) classification. However, these models often suffer from inefficiencies, as their computational complexity scales quadratically with sequence length. To address these challenges, we propose the morphological spatial mamba (SMM) and morphological spatial-spectral Mamba (… ▽ More Recent advancements in transformers, specifically self-attention mechanisms, have significantly improved hyperspectral image (HSI) classification. However, these models often suffer from inefficiencies, as their computational complexity scales quadratically with sequence length. To address these challenges, we propose the morphological spatial mamba (SMM) and morphological spatial-spectral Mamba (SSMM) model (MorpMamba), which combines the strengths of morphological operations and the state space model framework, offering a more computationally efficient alternative to transformers. In MorpMamba, a novel token generation module first converts HSI patches into spatial-spectral tokens. These tokens are then processed through morphological operations such as erosion and dilation, utilizing depthwise separable convolutions to capture structural and shape information. A token enhancement module refines these features by dynamically adjusting the spatial and spectral tokens based on central HSI regions, ensuring effective feature fusion within each block. Subsequently, multi-head self-attention is applied to further enrich the feature representations, allowing the model to capture complex relationships and dependencies within the data. Finally, the enhanced tokens are fed into a state space module, which efficiently models the temporal evolution of the features for classification. Experimental results on widely used HSI datasets demonstrate that MorpMamba achieves superior parametric efficiency compared to traditional CNN and transformer models while maintaining high accuracy. The code will be made publicly available at \url{https://github.com/mahmad000/MorpMamba}. △ Less

Submitted 30 November, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.14352 [pdf, ps, other]

Vision-Based Power Line Cables and Pylons Detection for Low Flying Aircraft

Authors: Jakub Gwizdała, Doruk Oner, Soumava Kumar Roy, Mian Akbar Shah, Ad Eberhard, Ivan Egorov, Philipp Krüsi, Grigory Yakushev, Pascal Fua

Abstract: Power lines are dangerous for low-flying aircraft, especially in low-visibility conditions. Thus, a vision-based system able to analyze the aircraft's surroundings and to provide the pilots with a "second pair of eyes" can contribute to enhancing their safety. To this end, we have developed a deep learning approach to jointly detect power line cables and pylons from images captured at distances of… ▽ More Power lines are dangerous for low-flying aircraft, especially in low-visibility conditions. Thus, a vision-based system able to analyze the aircraft's surroundings and to provide the pilots with a "second pair of eyes" can contribute to enhancing their safety. To this end, we have developed a deep learning approach to jointly detect power line cables and pylons from images captured at distances of several hundred meters by aircraft-mounted cameras. In doing so, we have combined a modern convolutional architecture with transfer learning and a loss function adapted to curvilinear structure delineation. We use a single network for both detection tasks and demonstrated its performance on two benchmarking datasets. We have integrated it within an onboard system and run it in flight, and have demonstrated with our experiments that it outperforms the prior distant cable detection method on both datasets, while also successfully detecting pylons, given their annotations are available for the data. △ Less

Submitted 30 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

Comments: Added several declarations at the end of the publication

arXiv:2407.12867 [pdf, other]

Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run

Authors: Gayathri Raman, Samuele Ronchini, James Delaunay, Aaron Tohuvavohu, Jamie A. Kennea, Tyler Parsotan, Elena Ambrosi, Maria Grazia Bernardini, Sergio Campana, Giancarlo Cusumano, Antonino D'Ai, Paolo D'Avanzo, Valerio D'Elia, Massimiliano De Pasquale, Simone Dichiara, Phil Evans, Dieter Hartmann, Paul Kuin, Andrea Melandri, Paul O'Brien, Julian P. Osborne, Kim Page, David M. Palmer, Boris Sbarufatti, Gianpiero Tagliaferri , et al. (1797 additional authors not shown)

Abstract: We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav… ▽ More We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers. △ Less

Submitted 27 March, 2025; v1 submitted 13 July, 2024; originally announced July 2024.

Comments: Update to version accepted for publication in ApJ. 50 pages, 10 figures, 4 tables

Journal ref: ApJ, Volume 980, 2025, 207

arXiv:2407.05088 [pdf, other]

Leveraging Task-Specific Knowledge from LLM for Semi-Supervised 3D Medical Image Segmentation

Authors: Suruchi Kumari, Aryan Das, Swalpa Kumar Roy, Indu Joshi, Pravendra Singh

Abstract: Traditional supervised 3D medical image segmentation models need voxel-level annotations, which require huge human effort, time, and cost. Semi-supervised learning (SSL) addresses this limitation of supervised learning by facilitating learning with a limited annotated and larger amount of unannotated training samples. However, state-of-the-art SSL models still struggle to fully exploit the potenti… ▽ More Traditional supervised 3D medical image segmentation models need voxel-level annotations, which require huge human effort, time, and cost. Semi-supervised learning (SSL) addresses this limitation of supervised learning by facilitating learning with a limited annotated and larger amount of unannotated training samples. However, state-of-the-art SSL models still struggle to fully exploit the potential of learning from unannotated samples. To facilitate effective learning from unannotated data, we introduce LLM-SegNet, which exploits a large language model (LLM) to integrate task-specific knowledge into our co-training framework. This knowledge aids the model in comprehensively understanding the features of the region of interest (ROI), ultimately leading to more efficient segmentation. Additionally, to further reduce erroneous segmentation, we propose a Unified Segmentation loss function. This loss function reduces erroneous segmentation by not only prioritizing regions where the model is confident in predicting between foreground or background pixels but also effectively addressing areas where the model lacks high confidence in predictions. Experiments on publicly available Left Atrium, Pancreas-CT, and Brats-19 datasets demonstrate the superior performance of LLM-SegNet compared to the state-of-the-art. Furthermore, we conducted several ablation studies to demonstrate the effectiveness of various modules and loss functions leveraged by LLM-SegNet. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: Under Review

arXiv:2406.16993 [pdf, other]

Are Vision xLSTM Embedded UNet More Reliable in Medical 3D Image Segmentation?

Authors: Pallabi Dutta, Soham Bose, Swalpa Kumar Roy, Sushmita Mitra

Abstract: The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems wit… ▽ More The development of efficient segmentation strategies for medical images has evolved from its initial dependence on Convolutional Neural Networks (CNNs) to the current investigation of hybrid models that combine CNNs with Vision Transformers. There is an increasing focus on creating architectures that are both high-performance and computationally efficient, able to be deployed on remote systems with limited resources. Although transformers can capture global dependencies in the input space, they face challenges from the corresponding high computational and storage expenses involved. This paper investigates the integration of CNNs with Vision Extended Long Short-Term Memory (Vision-xLSTM)s by introducing the novel {\it \textbf{U-VixLSTM}}. The Vision-xLSTM blocks capture temporal and global relationships within the patches, as extracted from the CNN feature maps. The convolutional feature reconstruction path upsamples the output volume from the Vision-xLSTM blocks, to produce the segmentation output. Our primary objective is to propose that Vision-xLSTM forms an appropriate backbone for medical image segmentation, offering excellent performance with reduced computational costs. The U-VixLSTM exhibits superior performance, compared to the state-of-the-art networks in the publicly available Synapse, ISIC and ACDC datasets. Code provided: https://github.com/duttapallabi2907/U-VixLSTM △ Less

Submitted 18 December, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.15719 [pdf, other]

How to Learn More? Exploring Kolmogorov-Arnold Networks for Hyperspectral Image Classification

Authors: Ali Jamali, Swalpa Kumar Roy, Danfeng Hong, Bing Lu, Pedram Ghamisi

Abstract: Convolutional Neural Networks (CNNs) and vision transformers (ViTs) have shown excellent capability in complex hyperspectral image (HSI) classification. However, these models require a significant number of training data and are computational resources. On the other hand, modern Multi-Layer Perceptrons (MLPs) have demonstrated great classification capability. These modern MLP-based models require… ▽ More Convolutional Neural Networks (CNNs) and vision transformers (ViTs) have shown excellent capability in complex hyperspectral image (HSI) classification. However, these models require a significant number of training data and are computational resources. On the other hand, modern Multi-Layer Perceptrons (MLPs) have demonstrated great classification capability. These modern MLP-based models require significantly less training data compared to CNNs and ViTs, achieving the state-of-the-art classification accuracy. Recently, Kolmogorov-Arnold Networks (KANs) were proposed as viable alternatives for MLPs. Because of their internal similarity to splines and their external similarity to MLPs, KANs are able to optimize learned features with remarkable accuracy in addition to being able to learn new features. Thus, in this study, we assess the effectiveness of KANs for complex HSI data classification. Moreover, to enhance the HSI classification accuracy obtained by the KANs, we develop and propose a Hybrid architecture utilizing 1D, 2D, and 3D KANs. To demonstrate the effectiveness of the proposed KAN architecture, we conducted extensive experiments on three newly created HSI benchmark datasets: QUH-Pingan, QUH-Tangdaowan, and QUH-Qingyun. The results underscored the competitive or better capability of the developed hybrid KAN-based model across these benchmark datasets over several other CNN- and ViT-based algorithms, including 1D-CNN, 2DCNN, 3D CNN, VGG-16, ResNet-50, EfficientNet, RNN, and ViT. The code are publicly available at (https://github.com/aj1365/HSIConvKAN) △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2405.12328 [pdf, other]

Multi-dimension Transformer with Attention-based Filtering for Medical Image Segmentation

Authors: Wentao Wang, Xi Xiao, Mingjie Liu, Qing Tian, Xuanyao Huang, Qizhen Lan, Swalpa Kumar Roy, Tianyang Wang

Abstract: The accurate segmentation of medical images is crucial for diagnosing and treating diseases. Recent studies demonstrate that vision transformer-based methods have significantly improved performance in medical image segmentation, primarily due to their superior ability to establish global relationships among features and adaptability to various inputs. However, these methods struggle with the low s… ▽ More The accurate segmentation of medical images is crucial for diagnosing and treating diseases. Recent studies demonstrate that vision transformer-based methods have significantly improved performance in medical image segmentation, primarily due to their superior ability to establish global relationships among features and adaptability to various inputs. However, these methods struggle with the low signal-to-noise ratio inherent to medical images. Additionally, the effective utilization of channel and spatial information, which are essential for medical image segmentation, is limited by the representation capacity of self-attention. To address these challenges, we propose a multi-dimension transformer with attention-based filtering (MDT-AF), which redesigns the patch embedding and self-attention mechanism for medical image segmentation. MDT-AF incorporates an attention-based feature filtering mechanism into the patch embedding blocks and employs a coarse-to-fine process to mitigate the impact of low signal-to-noise ratio. To better capture complex structures in medical images, MDT-AF extends the self-attention mechanism to incorporate spatial and channel dimensions, enriching feature representation. Moreover, we introduce an interaction mechanism to improve the feature aggregation between spatial and channel dimensions. Experimental results on three public medical image segmentation benchmarks show that MDT-AF achieves state-of-the-art (SOTA) performance. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11687 [pdf, other]

Crossing The Gap Using Variational Quantum Eigensolver: A Comparative Study

Authors: I-Chi Chen, Nouhaila Innan, Suman Kumar Roy, Jason Saroni

Abstract: Within the evolving domain of quantum computational chemistry, the Variational Quantum Eigensolver (VQE) has been developed to explore not only the ground state but also the excited states of molecules. In this study, we compare the performance of Variational Quantum Deflation (VQD) and Subspace-Search Variational Quantum Eigensolver (SSVQE) methods in determining the low-lying excited states of… ▽ More Within the evolving domain of quantum computational chemistry, the Variational Quantum Eigensolver (VQE) has been developed to explore not only the ground state but also the excited states of molecules. In this study, we compare the performance of Variational Quantum Deflation (VQD) and Subspace-Search Variational Quantum Eigensolver (SSVQE) methods in determining the low-lying excited states of $LiH$. Our investigation reveals that while VQD exhibits a slight advantage in accuracy, SSVQE stands out for its efficiency, allowing the determination of all low-lying excited states through a single parameter optimization procedure. We further evaluate the effectiveness of optimizers, including Gradient Descent (GD), Quantum Natural Gradient (QNG), and Adam optimizer, in obtaining $LiH$'s first excited state, with the Adam optimizer demonstrating superior efficiency in requiring the fewest iterations. Moreover, we propose a novel approach combining Folded Spectrum VQE (FS-VQE) with either VQD or SSVQE, enabling the exploration of highly excited states. We test the new approaches for finding all three $H_4$'s excited states. Folded Spectrum SSVQE (FS-SSVQE) can find all three highly excited states near $-1.0$ Ha with only one optimizing procedure, but the procedure converges slowly. In contrast, although Folded spectrum VQD (FS-VQD) gets highly excited states with individual optimizing procedures, the optimizing procedure converges faster. △ Less

Submitted 15 June, 2024; v1 submitted 19 May, 2024; originally announced May 2024.

Comments: 10 pages, 8 figures

arXiv:2404.19341 [pdf, other]

Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs

Authors: Soham Mitra, Atri Sukul, Swalpa Kumar Roy, Pravendra Singh, Vinay Verma

Abstract: Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces… ▽ More Deep learning models have achieved remarkable success across diverse domains. However, the intricate nature of these models often impedes a clear understanding of their decision-making processes. This is where Explainable AI (XAI) becomes indispensable, offering intuitive explanations for model decisions. In this work, we propose a simple yet highly effective approach, ScoreCAM++, which introduces modifications to enhance the promising ScoreCAM method for visual explainability. Our proposed approach involves altering the normalization function within the activation layer utilized in ScoreCAM, resulting in significantly improved results compared to previous efforts. Additionally, we apply an activation function to the upsampled activation layers to enhance interpretability. This improvement is achieved by selectively gating lower-priority values within the activation layer. Through extensive experiments and qualitative comparisons, we demonstrate that ScoreCAM++ consistently achieves notably superior performance and fairness in interpreting the decision-making process compared to both ScoreCAM and previous methods. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.04248 [pdf, other]

doi 10.3847/2041-8213/ad5beb

Observation of Gravitational Waves from the Coalescence of a $2.5\text{-}4.5~M_\odot$ Compact Object and a Neutron Star

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, I. Abouelfettouh, F. Acernese, K. Ackley, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, D. Agarwal, M. Agathos, M. Aghaei Abchouyeh, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, S. Akçay, T. Akutsu, S. Albanesi, R. A. Alfaidi, A. Al-Jodah , et al. (1771 additional authors not shown)

Abstract: We report the observation of a coalescing compact binary with component masses $2.5\text{-}4.5~M_\odot$ and $1.2\text{-}2.0~M_\odot$ (all measurements quoted at the 90% credible level). The gravitational-wave signal GW230529_181500 was observed during the fourth observing run of the LIGO-Virgo-KAGRA detector network on 2023 May 29 by the LIGO Livingston Observatory. The primary component of the so… ▽ More We report the observation of a coalescing compact binary with component masses $2.5\text{-}4.5~M_\odot$ and $1.2\text{-}2.0~M_\odot$ (all measurements quoted at the 90% credible level). The gravitational-wave signal GW230529_181500 was observed during the fourth observing run of the LIGO-Virgo-KAGRA detector network on 2023 May 29 by the LIGO Livingston Observatory. The primary component of the source has a mass less than $5~M_\odot$ at 99% credibility. We cannot definitively determine from gravitational-wave data alone whether either component of the source is a neutron star or a black hole. However, given existing estimates of the maximum neutron star mass, we find the most probable interpretation of the source to be the coalescence of a neutron star with a black hole that has a mass between the most massive neutron stars and the least massive black holes observed in the Galaxy. We provisionally estimate a merger rate density of $55^{+127}_{-47}~\text{Gpc}^{-3}\,\text{yr}^{-1}$ for compact binary coalescences with properties similar to the source of GW230529_181500; assuming that the source is a neutron star-black hole merger, GW230529_181500-like sources constitute about 60% of the total merger rate inferred for neutron star-black hole coalescences. The discovery of this system implies an increase in the expected rate of neutron star-black hole mergers with electromagnetic counterparts and provides further evidence for compact objects existing within the purported lower mass gap. △ Less

Submitted 26 July, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: 45 pages (10 pages author list, 13 pages main text, 1 page acknowledgements, 13 pages appendices, 8 pages bibliography), 17 figures, 16 tables. Update to match version published in The Astrophysical Journal Letters. Data products available from https://zenodo.org/records/10845779

Report number: LIGO-P2300352

Journal ref: ApJL 970, L34 (2024)

arXiv:2403.03004 [pdf, other]

Ultralight vector dark matter search using data from the KAGRA O3GK run

Authors: The LIGO Scientific Collaboration, the Virgo Collaboration, the KAGRA Collaboration, A. G. Abac, R. Abbott, H. Abe, I. Abouelfettouh, F. Acernese, K. Ackley, C. Adamcewicz, S. Adhicary, N. Adhikari, R. X. Adhikari, V. K. Adkins, V. B. Adya, C. Affeldt, D. Agarwal, M. Agathos, O. D. Aguiar, I. Aguilar, L. Aiello, A. Ain, P. Ajith, T. Akutsu, S. Albanesi , et al. (1778 additional authors not shown)

Abstract: Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese… ▽ More Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM. △ Less

Submitted 5 March, 2024; originally announced March 2024.

Comments: 20 pages, 5 figures

Report number: LIGO-P2300250

arXiv:2403.00396 [pdf, other]

doi 10.1109/ISBI56570.2024.10635344

GLFNET: Global-Local (frequency) Filter Networks for efficient medical image segmentation

Authors: Athanasios Tragakis, Qianying Liu, Chaitanya Kaul, Swalpa Kumar Roy, Hang Dai, Fani Deligianni, Roderick Murray-Smith, Daniele Faccio

Abstract: We propose a novel transformer-style architecture called Global-Local Filter Network (GLFNet) for medical image segmentation and demonstrate its state-of-the-art performance. We replace the self-attention mechanism with a combination of global-local filter blocks to optimize model efficiency. The global filters extract features from the whole feature map whereas the local filters are being adaptiv… ▽ More We propose a novel transformer-style architecture called Global-Local Filter Network (GLFNet) for medical image segmentation and demonstrate its state-of-the-art performance. We replace the self-attention mechanism with a combination of global-local filter blocks to optimize model efficiency. The global filters extract features from the whole feature map whereas the local filters are being adaptively created as 4x4 patches of the same feature map and add restricted scale information. In particular, the feature extraction takes place in the frequency domain rather than the commonly used spatial (image) domain to facilitate faster computations. The fusion of information from both spatial and frequency spaces creates an efficient model with regards to complexity, required data and performance. We test GLFNet on three benchmark datasets achieving state-of-the-art performance on all of them while being almost twice as efficient in terms of GFLOP operations. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Journal ref: 2024 IEEE International Symposium on Biomedical Imaging (ISBI)

arXiv:2402.18605 [pdf, other]

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

Authors: Hadi Tabealhojeh, Soumava Kumar Roy, Peyman Adibi, Hossein Karshenas

Abstract: Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Rie… ▽ More Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Riemannian backpropagation needs computing the second-order derivatives that include backward computations through the Riemannian operators such as retraction and orthogonal projection. This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold. Our method significantly reduces the computational load and memory footprint. We show how using a Stiefel fully-connected layer that enforces orthogonality constraint on the parameters of the last classification layer as the head of the backbone network, strengthens the representation reuse of the gradient-based meta-learning methods. Our experimental results across various few-shot learning datasets, demonstrate the superiority of our proposed method compared to the state-of-the-art methods, especially MAML, its Euclidean counterpart. △ Less

Submitted 31 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.11036 [pdf, other]

Occlusion Resilient 3D Human Pose Estimation

Authors: Soumava Kumar Roy, Ilia Badanin, Sina Honari, Pascal Fua

Abstract: Occlusions remain one of the key challenges in 3D body pose estimation from single-camera video sequences. Temporal consistency has been extensively used to mitigate their impact but the existing algorithms in the literature do not explicitly model them. Here, we apply this by representing the deforming body as a spatio-temporal graph. We then introduce a refinement network that performs graph c… ▽ More Occlusions remain one of the key challenges in 3D body pose estimation from single-camera video sequences. Temporal consistency has been extensively used to mitigate their impact but the existing algorithms in the literature do not explicitly model them. Here, we apply this by representing the deforming body as a spatio-temporal graph. We then introduce a refinement network that performs graph convolutions over this graph to output 3D poses. To ensure robustness to occlusions, we train this network with a set of binary masks that we use to disable some of the edges as in drop-out techniques. In effect, we simulate the fact that some joints can be hidden for periods of time and train the network to be immune to that. We demonstrate the effectiveness of this approach compared to state-of-the-art techniques that infer poses from single-camera sequences. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2312.10407 [pdf, ps, other]

DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content

Authors: Wentao Wang, Xuanyao Huang, Tianyang Wang, Swalpa Kumar Roy

Abstract: This paper explores the image synthesis capabilities of GPT-4, a leading multi-modal large language model. We establish a benchmark for evaluating the fidelity of texture features in images generated by GPT-4, comprising manually painted pictures and their AI-generated counterparts. The contributions of this study are threefold: First, we provide an in-depth analysis of the fidelity of image synth… ▽ More This paper explores the image synthesis capabilities of GPT-4, a leading multi-modal large language model. We establish a benchmark for evaluating the fidelity of texture features in images generated by GPT-4, comprising manually painted pictures and their AI-generated counterparts. The contributions of this study are threefold: First, we provide an in-depth analysis of the fidelity of image synthesis features based on GPT-4, marking the first such study on this state-of-the-art model. Second, the quantitative and qualitative experiments fully reveals the limitations of the GPT-4 model in image synthesis. Third, we have compiled a unique benchmark of manual drawings and corresponding GPT-4-generated images, introducing a new task to advance fidelity research in AI-generated content (AIGC). The dataset is available at: \url{https://github.com/rickwang28574/DeepArt}. △ Less

Submitted 24 December, 2023; v1 submitted 16 December, 2023; originally announced December 2023.

Comments: This is the second version of this work, and new contributors join and the modification content is greatly increased

arXiv:2312.03946 [pdf, other]

A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement

Authors: Risab Biswas, Swalpa Kumar Roy, Umapada Pal

Abstract: Document image enhancement is a fundamental and important stage for attaining the best performance in any document analysis assignment because there are many degradation situations that could harm document images, making it more difficult to recognize and analyze them. In this paper, we propose \textbf{T2T-BinFormer} which is a novel document binarization encoder-decoder architecture based on a To… ▽ More Document image enhancement is a fundamental and important stage for attaining the best performance in any document analysis assignment because there are many degradation situations that could harm document images, making it more difficult to recognize and analyze them. In this paper, we propose \textbf{T2T-BinFormer} which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer. Each image is divided into a set of tokens with a defined length using the ViT model, which is then applied several times to model the global relationship between the tokens. However, the conventional tokenization of input data does not adequately reflect the crucial local structure between adjacent pixels of the input image, which results in low efficiency. Instead of using a simple ViT and hard splitting of images for the document image enhancement task, we employed a progressive tokenization technique to capture this local information from an image to achieve more effective results. Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods. In this research, the primary area of examination is the application of the proposed architecture to the task of document binarization. The source code will be made available at https://github.com/RisabBiswas/T2T-BinFormer. △ Less

Submitted 6 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2312.03568

arXiv:2312.03568 [pdf, other]

DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization

Authors: Risab Biswas, Swalpa Kumar Roy, Ning Wang, Umapada Pal, Guang-Bin Huang

Abstract: In real life, various degradation scenarios exist that might damage document images, making it harder to recognize and analyze them, thus binarization is a fundamental and crucial step for achieving the most optimal performance in any document analysis task. We propose DocBinFormer (Document Binarization Transformer), a novel two-level vision transformer (TL-ViT) architecture based on vision trans… ▽ More In real life, various degradation scenarios exist that might damage document images, making it harder to recognize and analyze them, thus binarization is a fundamental and crucial step for achieving the most optimal performance in any document analysis task. We propose DocBinFormer (Document Binarization Transformer), a novel two-level vision transformer (TL-ViT) architecture based on vision transformers for effective document image binarization. The presented architecture employs a two-level transformer encoder to effectively capture both global and local feature representation from the input images. These complimentary bi-level features are exploited for efficient document image binarization, resulting in improved results for system-generated as well as handwritten document images in a comprehensive approach. With the absence of convolutional layers, the transformer encoder uses the pixel patches and sub-patches along with their positional information to operate directly on them, while the decoder generates a clean (binarized) output image from the latent representation of the patches. Instead of using a simple vision transformer block to extract information from the image patches, the proposed architecture uses two transformer blocks for greater coverage of the extracted feature space on a global and local scale. The encoded feature representation is used by the decoder block to generate the corresponding binarized output. Extensive experiments on a variety of DIBCO and H-DIBCO benchmarks show that the proposed model outperforms state-of-the-art techniques on four metrics. The source code will be made available at https://github.com/RisabBiswas/DocBinFormer. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2310.04332 [pdf, other]

On the Parameterized Complexity of Multiway Near-Separator

Authors: Bart M. P. Jansen, Shivesh K. Roy

Abstract: We study a new graph separation problem called Multiway Near-Separator. Given an undirected graph $G$, integer $k$, and terminal set $T \subseteq V(G)$, it asks whether there is a vertex set $S \subseteq V(G) \setminus T$ of size at most $k$ such that in graph $G-S$, no pair of distinct terminals can be connected by two pairwise internally vertex-disjoint paths. Hence each terminal pair can be sep… ▽ More We study a new graph separation problem called Multiway Near-Separator. Given an undirected graph $G$, integer $k$, and terminal set $T \subseteq V(G)$, it asks whether there is a vertex set $S \subseteq V(G) \setminus T$ of size at most $k$ such that in graph $G-S$, no pair of distinct terminals can be connected by two pairwise internally vertex-disjoint paths. Hence each terminal pair can be separated in $G-S$ by removing at most one vertex. The problem is therefore a generalization of (Node) Multiway Cut, which asks for a vertex set for which each terminal is in a different component of $G-S$. We develop a fixed-parameter tractable algorithm for Multiway Near-Separator running in time $2^{O(k \log k)} * n^{O(1)}$. Our algorithm is based on a new pushing lemma for solutions with respect to important separators, along with two problem-specific ingredients. The first is a polynomial-time subroutine to reduce the number of terminals in the instance to a polynomial in the solution size $k$ plus the size of a given suboptimal solution. The second is a polynomial-time algorithm that, given a graph $G$ and terminal set $T \subseteq V(G)$ along with a single vertex $x \in V(G)$ that forms a multiway near-separator, computes a 14-approximation for the problem of finding a multiway near-separator not containing $x$. △ Less

Submitted 6 October, 2023; originally announced October 2023.

Comments: Conference version to appear at the International Symposium on Parameterized and Exact Computation (IPEC 2023)

arXiv:2308.11465 [pdf, other]

Computation of covariant lyapunov vectors using data assimilation

Authors: Shashank Kumar Roy, Amit Apte

Abstract: Computing Lyapunov vectors from partial and noisy observations is a challenging problem. We propose a method using data assimilation to approximate the Lyapunov vectors using the estimate of the underlying trajectory obtained from the filter mean. We then extensively study the sensitivity of these approximate Lyapunov vectors and the corresponding Oseledets' subspaces to the perturbations in the u… ▽ More Computing Lyapunov vectors from partial and noisy observations is a challenging problem. We propose a method using data assimilation to approximate the Lyapunov vectors using the estimate of the underlying trajectory obtained from the filter mean. We then extensively study the sensitivity of these approximate Lyapunov vectors and the corresponding Oseledets' subspaces to the perturbations in the underlying true trajectory. We demonstrate that this sensitivity is consistent with and helps explain the errors in the approximate Lyapunov vectors from the estimated trajectory of the filter. Using the idea of principal angles, we demonstrate that the Oseledets' subspaces defined by the LVs computed from the approximate trajectory are less sensitive than the individual vectors. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: 20 pages, 9 figures and no tables

MSC Class: 37Mxx; 37Nxx

arXiv:2308.05235 [pdf, other]

doi 10.1109/LGRS.2024.3354175

Spatial Gated Multi-Layer Perceptron for Land Use and Land Cover Mapping

Authors: Ali Jamali, Swalpa Kumar Roy, Danfeng Hong, Peter M Atkinson, Pedram Ghamisi

Abstract: Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the… ▽ More Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Submitted in IEEE

arXiv:2306.04947 [pdf, other]

Neighborhood Attention Makes the Encoder of ResUNet Stronger for Accurate Road Extraction

Authors: Ali Jamali, Swalpa Kumar Roy, Jonathan Li, Pedram Ghamisi

Abstract: In the domain of remote sensing image interpretation, road extraction from high-resolution aerial imagery has already been a hot research topic. Although deep CNNs have presented excellent results for semantic segmentation, the efficiency and capabilities of vision transformers are yet to be fully researched. As such, for accurate road extraction, a deep semantic segmentation neural network that u… ▽ More In the domain of remote sensing image interpretation, road extraction from high-resolution aerial imagery has already been a hot research topic. Although deep CNNs have presented excellent results for semantic segmentation, the efficiency and capabilities of vision transformers are yet to be fully researched. As such, for accurate road extraction, a deep semantic segmentation neural network that utilizes the abilities of residual learning, HetConvs, UNet, and vision transformers, which is called \texttt{ResUNetFormer}, is proposed in this letter. The developed \texttt{ResUNetFormer} is evaluated on various cutting-edge deep learning-based road extraction techniques on the public Massachusetts road dataset. Statistical and visual results demonstrate the superiority of the \texttt{ResUNetFormer} over the state-of-the-art CNNs and vision transformers for segmentation. The code will be made available publicly at \url{https://github.com/aj1365/ResUNetFormer}. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Submitted in IEEE

arXiv:2209.06469 [pdf, other]

Learning Deep Optimal Embeddings with Sinkhorn Divergences

Authors: Soumava Kumar Roy, Yan Han, Mehrtash Harandi, Lars Petersson

Abstract: Deep Metric Learning algorithms aim to learn an efficient embedding space to preserve the similarity relationships among the input data. Whilst these algorithms have achieved significant performance gains across a wide plethora of tasks, they have also failed to consider and increase comprehensive similarity constraints; thus learning a sub-optimal metric in the embedding space. Moreover, up until… ▽ More Deep Metric Learning algorithms aim to learn an efficient embedding space to preserve the similarity relationships among the input data. Whilst these algorithms have achieved significant performance gains across a wide plethora of tasks, they have also failed to consider and increase comprehensive similarity constraints; thus learning a sub-optimal metric in the embedding space. Moreover, up until now; there have been few studies with respect to their performance in the presence of noisy labels. Here, we address the concern of learning a discriminative deep embedding space by designing a novel, yet effective Deep Class-wise Discrepancy Loss (DCDL) function that segregates the underlying similarity distributions (thus introducing class-wise discrepancy) of the embedding points between each and every class. Our empirical results across three standard image classification datasets and two fine-grained image recognition datasets in the presence and absence of noise clearly demonstrate the need for incorporating such class-wise similarity relationships along with traditional algorithms while learning a discriminative embedding space. △ Less

Submitted 14 September, 2022; originally announced September 2022.

arXiv:2208.10810 [pdf, other]

doi 10.1016/j.physd.2023.133765

Probing robustness of nonlinear filter stability numerically using Sinkhorn divergence

Authors: Pinak Mandal, Shashank Kumar Roy, Amit Apte

Abstract: Using the recently developed Sinkhorn algorithm for approximating the Wasserstein distance between probability distributions represented by Monte Carlo samples, we demonstrate exponential filter stability of two commonly used nonlinear filtering algorithms, namely, the particle filter and the ensemble Kalman filter, for deterministic dynamical systems. We also establish numerically a relation betw… ▽ More Using the recently developed Sinkhorn algorithm for approximating the Wasserstein distance between probability distributions represented by Monte Carlo samples, we demonstrate exponential filter stability of two commonly used nonlinear filtering algorithms, namely, the particle filter and the ensemble Kalman filter, for deterministic dynamical systems. We also establish numerically a relation between filter stability and filter convergence by showing that the Wasserstein distance between filters with two different initial conditions is proportional to the bias or the RMSE of the filter. △ Less

Submitted 27 April, 2023; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: 14 pages, 2 figures, 2 tables

MSC Class: 93E11(Primary); 60G35; 62M20

arXiv:2204.11449 [pdf, other]

OCFormer: One-Class Transformer Network for Image Classification

Authors: Prerana Mukherjee, Chandan Kumar Roy, Swalpa Kumar Roy

Abstract: We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, whi… ▽ More We propose a novel deep learning framework based on Vision Transformers (ViT) for one-class classification. The core idea is to use zero-centered Gaussian noise as a pseudo-negative class for latent space representation and then train the network using the optimal loss function. In prior works, there have been tremendous efforts to learn a good representation using varieties of loss functions, which ensures both discriminative and compact properties. The proposed one-class Vision Transformer (OCFormer) is exhaustively experimented on CIFAR-10, CIFAR-100, Fashion-MNIST and CelebA eyeglasses datasets. Our method has shown significant improvements over competing CNN based one-class classifier approaches. △ Less

Submitted 25 April, 2022; originally announced April 2022.

arXiv:2203.17076 [pdf, other]

doi 10.1109/TGRS.2022.3196057

Deep Hyperspectral Unmixing using Transformer Network

Authors: Preetam Ghosh, Swalpa Kumar Roy, Bikram Koirala, Behnood Rasti, Paul Scheunders

Abstract: Currently, this paper is under review in IEEE. Transformers have intrigued the vision research community with their state-of-the-art performance in natural language processing. With their superior performance, transformers have found their way in the field of hyperspectral image classification and achieved promising results. In this article, we harness the power of transformers to conquer the task… ▽ More Currently, this paper is under review in IEEE. Transformers have intrigued the vision research community with their state-of-the-art performance in natural language processing. With their superior performance, transformers have found their way in the field of hyperspectral image classification and achieved promising results. In this article, we harness the power of transformers to conquer the task of hyperspectral unmixing and propose a novel deep unmixing model with transformers. We aim to utilize the ability of transformers to better capture the global feature dependencies in order to enhance the quality of the endmember spectra and the abundance maps. The proposed model is a combination of a convolutional autoencoder and a transformer. The hyperspectral data is encoded by the convolutional encoder. The transformer captures long-range dependencies between the representations derived from the encoder. The data are reconstructed using a convolutional decoder. We applied the proposed unmixing model to three widely used unmixing datasets, i.e., Samson, Apex, and Washington DC mall and compared it with the state-of-the-art in terms of root mean squared error and spectral angle distance. The source code for the proposed model will be made publicly available at \url{https://github.com/preetam22n/DeepTrans-HSU}. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: Currently, this paper is under review in IEEE

arXiv:2203.16952 [pdf, other]

doi 10.1109/TGRS.2023.3286826

Multimodal Fusion Transformer for Remote Sensing Image Classification

Authors: Swalpa Kumar Roy, Ankur Deria, Danfeng Hong, Behnood Rasti, Antonio Plaza, Jocelyn Chanussot

Abstract: Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar tra… ▽ More Vision transformers (ViTs) have been trending in image classification tasks due to their promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViTs in hyperspectral image (HSI) classification tasks. To achieve satisfactory performance, close to that of CNNs, transformers need fewer parameters. ViTs and other similar transformers use an external classification (CLS) token which is randomly initialized and often fails to generalize well, whereas other sources of multimodal datasets, such as light detection and ranging (LiDAR) offer the potential to improve these models by means of a CLS. In this paper, we introduce a new multimodal fusion transformer (MFT) network which comprises a multihead cross patch attention (mCrossPA) for HSI land-cover classification. Our mCrossPA utilizes other sources of complementary information in addition to the HSI in the transformer encoder to achieve better generalization. The concept of tokenization is used to generate CLS and HSI patch tokens, helping to learn a {distinctive representation} in a reduced and hierarchical feature space. Extensive experiments are carried out on {widely used benchmark} datasets {i.e.,} the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. We compare the results of the proposed MFT model with other state-of-the-art transformers, classical CNNs, and conventional classifiers models. The superior performance achieved by the proposed model is due to the use of multihead cross patch attention. The source code will be made available publicly at \url{https://github.com/AnkurDeria/MFT}.} △ Less

Submitted 20 June, 2023; v1 submitted 31 March, 2022; originally announced March 2022.

Comments: Published in IEEE Transactions on Geoscience and Remote Sensing

arXiv:2203.15865 [pdf, other]

On Triangulation as a Form of Self-Supervision for 3D Human Pose Estimation

Authors: Soumava Kumar Roy, Leonardo Citraro, Sina Honari, Pascal Fua

Abstract: Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant. However, as the acquisition of ground-truth 3D labels is labor intensive and time consuming, recent attention has shifted towards semi- and weakly-supervised learning. Generating an effective form of supervision with little annotations still poses major challenge in crowded scenes… ▽ More Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant. However, as the acquisition of ground-truth 3D labels is labor intensive and time consuming, recent attention has shifted towards semi- and weakly-supervised learning. Generating an effective form of supervision with little annotations still poses major challenge in crowded scenes. In this paper we propose to impose multi-view geometrical constraints by means of a weighted differentiable triangulation and use it as a form of self-supervision when no labels are available. We therefore train a 2D pose estimator in such a way that its predictions correspond to the re-projection of the triangulated 3D pose and train an auxiliary network on them to produce the final 3D poses. We complement the triangulation with a weighting mechanism that alleviates the impact of noisy predictions caused by self-occlusion or occlusion from other subjects. We demonstrate the effectiveness of our semi-supervised approach on Human3.6M and MPI-INF-3DHP datasets, as well as on a new multi-view multi-person dataset that features occlusion. △ Less

Submitted 28 June, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

arXiv:2201.01001 [pdf, other]

Attention Mechanism Meets with Hybrid Dense Network for Hyperspectral Image Classification

Authors: Muhammad Ahmad, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Swalpa Kumar Roy, Xin Wu

Abstract: Convolutional Neural Networks (CNN) are more suitable, indeed. However, fixed kernel sizes make traditional CNN too specific, neither flexible nor conducive to feature learning, thus impacting on the classification accuracy. The convolution of different kernel size networks may overcome this problem by capturing more discriminating and relevant information. In light of this, the proposed solution… ▽ More Convolutional Neural Networks (CNN) are more suitable, indeed. However, fixed kernel sizes make traditional CNN too specific, neither flexible nor conducive to feature learning, thus impacting on the classification accuracy. The convolution of different kernel size networks may overcome this problem by capturing more discriminating and relevant information. In light of this, the proposed solution aims at combining the core idea of 3D and 2D Inception net with the Attention mechanism to boost the HSIC CNN performance in a hybrid scenario. The resulting \textit{attention-fused hybrid network} (AfNet) is based on three attention-fused parallel hybrid sub-nets with different kernels in each block repeatedly using high-level features to enhance the final ground-truth maps. In short, AfNet is able to selectively filter out the discriminative features critical for classification. Several tests on HSI datasets provided competitive results for AfNet compared to state-of-the-art models. The proposed pipeline achieved, indeed, an overall accuracy of 97\% for the Indian Pines, 100\% for Botswana, 99\% for Pavia University, Pavia Center, and Salinas datasets. △ Less

Submitted 4 January, 2022; originally announced January 2022.

arXiv:2107.02554 [pdf, other]

On the Hardness of Compressing Weights

Authors: Bart M. P. Jansen, Shivesh K. Roy, Michał Włodarczyk

Abstract: We investigate computational problems involving large weights through the lens of kernelization, which is a framework of polynomial-time preprocessing aimed at compressing the instance size. Our main focus is the weighted Clique problem, where we are given an edge-weighted graph and the goal is to detect a clique of total weight equal to a prescribed value. We show that the weighted variant, param… ▽ More We investigate computational problems involving large weights through the lens of kernelization, which is a framework of polynomial-time preprocessing aimed at compressing the instance size. Our main focus is the weighted Clique problem, where we are given an edge-weighted graph and the goal is to detect a clique of total weight equal to a prescribed value. We show that the weighted variant, parameterized by the number of vertices $n$, is significantly harder than the unweighted problem by presenting an $O(n^{3 - \varepsilon})$ lower bound on the size of the kernel, under the assumption that NP $\not \subseteq$ coNP/poly. This lower bound is essentially tight: we show that we can reduce the problem to the case with weights bounded by $2^{O(n)}$, which yields a randomized kernel of $O(n^3)$ bits. We generalize these results to the weighted $d$-Uniform Hyperclique problem, Subset Sum, and weighted variants of Boolean Constraint Satisfaction Problems (CSPs). We also study weighted minimization problems and show that weight compression is easier when we only want to preserve the collection of optimal solutions. Namely, we show that for node-weighted Vertex Cover on bipartite graphs it is possible to maintain the set of optimal solutions using integer weights from the range $[1, n]$, but if we want to maintain the ordering of the weights of all inclusion-minimal solutions, then weights as large as $2^{Ω(n)}$ are necessary. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: To appear at MFCS'21

arXiv:2105.10190 [pdf, other]

AngularGrad: A New Optimization Technique for Angular Convergence of Convolutional Neural Networks

Authors: S. K. Roy, M. E. Paoletti, J. M. Haut, S. R. Dubey, P. Kar, A. Plaza, B. B. Chaudhuri

Abstract: Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper propose… ▽ More Convolutional neural networks (CNNs) are trained using stochastic gradient descent (SGD)-based optimizers. Recently, the adaptive moment estimation (Adam) optimizer has become very popular due to its adaptive momentum, which tackles the dying gradient problem of SGD. Nevertheless, existing optimizers are still unable to exploit the optimization curvature information efficiently. This paper proposes a new AngularGrad optimizer that considers the behavior of the direction/angle of consecutive gradients. This is the first attempt in the literature to exploit the gradient angular information apart from its magnitude. The proposed AngularGrad generates a score to control the step size based on the gradient angular information of previous iterations. Thus, the optimization steps become smoother as a more accurate step size of immediate past gradients is captured through the angular information. Two variants of AngularGrad are developed based on the use of Tangent or Cosine functions for computing the gradient angular information. Theoretically, AngularGrad exhibits the same regret bound as Adam for convergence purposes. Nevertheless, extensive experiments conducted on benchmark data sets against state-of-the-art methods reveal a superior performance of AngularGrad. The source code will be made publicly available at: https://github.com/mhaut/AngularGrad. △ Less

Submitted 9 September, 2023; v1 submitted 21 May, 2021; originally announced May 2021.

arXiv:2103.04059 [pdf, other]

Semantic-aware Knowledge Distillation for Few-Shot Class-Incremental Learning

Authors: Ali Cheraghian, Shafin Rahman, Pengfei Fang, Soumava Kumar Roy, Lars Petersson, Mehrtash Harandi

Abstract: Few-shot class incremental learning (FSCIL) portrays the problem of learning new concepts gradually, where only a few examples per concept are available to the learner. Due to the limited number of examples for training, the techniques developed for standard incremental learning cannot be applied verbatim to FSCIL. In this work, we introduce a distillation algorithm to address the problem of FSCIL… ▽ More Few-shot class incremental learning (FSCIL) portrays the problem of learning new concepts gradually, where only a few examples per concept are available to the learner. Due to the limited number of examples for training, the techniques developed for standard incremental learning cannot be applied verbatim to FSCIL. In this work, we introduce a distillation algorithm to address the problem of FSCIL and propose to make use of semantic information during training. To this end, we make use of word embeddings as semantic information which is cheap to obtain and which facilitate the distillation process. Furthermore, we propose a method based on an attention mechanism on multiple parallel embeddings of visual data to align visual and semantic vectors, which reduces issues related to catastrophic forgetting. Via experiments on MiniImageNet, CUB200, and CIFAR100 dataset, we establish new state-of-the-art results by outperforming existing approaches. △ Less

Submitted 30 March, 2021; v1 submitted 6 March, 2021; originally announced March 2021.

Comments: Accepted at CVPR 2021

arXiv:2103.01844 [pdf, other]

doi 10.1063/5.0050274

A Molecular Field Approach to Pressure Induced Phase Transitions in Liquid Crystals : Smectic-Nematic transition

Authors: Sabana Shabnam, Sudeshna DasGupta, Nababrata Ghoshal, Ananda DasGupta, Soumen Kumar Roy

Abstract: Since a rigorous microscopic treatment of a nematic fluid system based on a pairwise interaction potential is immensely complex we had introduced a simple mean field potential which was a modification of the Maier-Saupe potential in a previous paper. Building up on that here we have modified that potential to take into account the various aspects of a smectic A-nematic phase transition. In particu… ▽ More Since a rigorous microscopic treatment of a nematic fluid system based on a pairwise interaction potential is immensely complex we had introduced a simple mean field potential which was a modification of the Maier-Saupe potential in a previous paper. Building up on that here we have modified that potential to take into account the various aspects of a smectic A-nematic phase transition. In particular we have studied the dependence of the phase transition on the coupling coefficient between the nematic and smectic order parameters which in turn depends on the length of alkyl chain, existence of tricritical point, variation of entropy and specific heat as well as the dependence of the phase transition on pressure. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2101.06116 [pdf, other]

doi 10.1109/JSTARS.2021.3133021

Hyperspectral Image Classification-Traditional to Deep Models: A Survey for Future Prospects

Authors: Muhammad Ahmad, Sidrah Shabbir, Swalpa Kumar Roy, Danfeng Hong, Xin Wu, Jing Yao, Adil Mehmood Khan, Manuel Mazzara, Salvatore Distefano, Jocelyn Chanussot

Abstract: Hyperspectral Imaging (HSI) has been extensively utilized in many real-life applications because it benefits from the detailed spectral information contained in each pixel. Notably, the complex characteristics i.e., the nonlinear relation among the captured spectral information and the corresponding object of HSI data make accurate classification challenging for traditional methods. In the last fe… ▽ More Hyperspectral Imaging (HSI) has been extensively utilized in many real-life applications because it benefits from the detailed spectral information contained in each pixel. Notably, the complex characteristics i.e., the nonlinear relation among the captured spectral information and the corresponding object of HSI data make accurate classification challenging for traditional methods. In the last few years, Deep Learning (DL) has been substantiated as a powerful feature extractor that effectively addresses the nonlinear problems that appeared in a number of computer vision tasks. This prompts the deployment of DL for HSI classification (HSIC) which revealed good performance. This survey enlists a systematic overview of DL for HSIC and compared state-of-the-art strategies on the said topic. Primarily, we will encapsulate the main challenges of traditional machine learning for HSIC and then we will acquaint the superiority of DL to address these problems. This survey breakdown the state-of-the-art DL frameworks into spectral features, spatial features, and together spatial-spectral features to systematically analyze the achievements (future research directions as well) of these frameworks for HSIC. Moreover, we will consider the fact that DL requires a large number of labeled training examples whereas acquiring such a number for HSIC is challenging in terms of time and cost. Therefore, this survey discusses some strategies to improve the generalization performance of DL strategies which can provide some future guidelines. △ Less

Submitted 27 April, 2022; v1 submitted 15 January, 2021; originally announced January 2021.

Comments: https://ieeexplore.ieee.org/abstract/document/9645266

arXiv:2011.01194 [pdf, other]

Effective General Relativistic Description of Jamming in Granular Matter

Authors: Soumendra Kishore Roy, Pratyusava Baral, Ratna Koley, Parthasarathi Majumdar

Abstract: We propose here that certain observational features of granular matter in the infrared limit, exhibiting the phenomenon of {\it jamming}, arise from an underlying effective general relativistic description. The proposal stems from the assumption (which we justify on physical grounds) that grains in granular matter move freely in an {\it effective} curved Riemannian space. The termination of their… ▽ More We propose here that certain observational features of granular matter in the infrared limit, exhibiting the phenomenon of {\it jamming}, arise from an underlying effective general relativistic description. The proposal stems from the assumption (which we justify on physical grounds) that grains in granular matter move freely in an {\it effective} curved Riemannian space. The termination of their trajectories at the onset of jamming is obtained from the focussing of a converging congruence of geodesics in such a space, as a solution of the Raychaudhuri equation for such congruences. This may happen irrespective of whether or not the curvature is sourced by external stresses (via an effective Einstein equation), although the properties of the resultant jammed state solution do differ in the two cases. A definite prediction of this geometrical approach is the negative role played by those trajectories which twist about each other, in reaching the jammed state. The local symmetries of granular interaction, translational and rotational invariance (corresponding to `force balance' and `torque balance' in standard force-based approaches to jamming) are inherent in the effective general relativity framework. A recently-proposed effective elasticity model of the jammed state, based on a tensorial variant of standard electrostatics (Vector Charge Theory), is seen to be entirely subsumed within the linearized version of the effective general relativistic description. △ Less

Submitted 27 August, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: 8 pages Revtex 4-1, 6 figures. An expanded version of the previous one, with substantial modifications in the Introduction and also some modifications in section 4. One figure modified and two new figures added. Changes made in response to criticism of colleagues and anonymous referees

arXiv:2007.05328 [pdf, ps, other]

doi 10.1140/epjp/s13360-021-01473-1

Compact star deformation and universal relationship for magnetized white dwarfs

Authors: Sujan Kumar Roy, Somnath Mukhopadhyay, D. N. Basu

Abstract: Recently super-Chandrasekhar mass limit has been derived theoretically in presence of strong magnetic field to complement experimental observations. In the framework of Newtonian physics, we have studied the equilibrium configurations of such magnetized white dwarfs by using the relativistic Thomas-Fermi equation of state for magnetized white-dwarfs. Hartle formalism, for slowly rotating stars, ha… ▽ More Recently super-Chandrasekhar mass limit has been derived theoretically in presence of strong magnetic field to complement experimental observations. In the framework of Newtonian physics, we have studied the equilibrium configurations of such magnetized white dwarfs by using the relativistic Thomas-Fermi equation of state for magnetized white-dwarfs. Hartle formalism, for slowly rotating stars, has been employed to obtain the equations of equilibrium. Various physical quantities of uniformly rotating and non-rotating white dwarfs have been calculated within this formalism. Consequently, the universality relationship between the moment of inertia(I), rotational love number($λ$) and spin induced quadrupole moment(Q), namely the I-Love-Q relationship, has been investigated for such magnetized white dwarfs. The relationship between I, eccentricity and Q i.e. I-eccentricity-Q relationship has also been derived. Further, we have found that, the I-eccentricity-Q relationship is more universal in comparison to I-Love-Q relationship. △ Less

Submitted 10 July, 2020; originally announced July 2020.

Comments: 13 pages including 23 figures and 1 table

Journal ref: Eur.Phys.J.Plus 136 (2021) no.4, 467

arXiv:2007.02887 [pdf, ps, other]

doi 10.1103/PhysRevD.102.084045

Probing the post-Minkowskian approximation using recursive addition of self-interactions

Authors: Soumendra Kishore Roy, Ratna Koley, Parthasarathi Majumdar

Abstract: We address the problem of deriving the post-Minkowskian approximation, widely used in current gravitational wave literature by investigating a possible deduction out of the recursive Nöther coupling approach, from the Pauli-Fierz spin-2 theory in flat spacetime. We find that this approach yields the post-Minkowskian approximation correctly to the first three orders, without invoking any weak-field… ▽ More We address the problem of deriving the post-Minkowskian approximation, widely used in current gravitational wave literature by investigating a possible deduction out of the recursive Nöther coupling approach, from the Pauli-Fierz spin-2 theory in flat spacetime. We find that this approach yields the post-Minkowskian approximation correctly to the first three orders, without invoking any weak-field limit of general relativity. This connection thus establishes that the post-Minkowskian approximation has a connotation independent of a weak-field expansion of general relativity, which is the manner usually presented in the literature. As a consequence, a link manifests between the recursive Nöther coupling approach to deriving general relativity from a linear spin-2 theory in flat spacetime and theoretical analyses of recent detection of gravitational wave events. △ Less

Submitted 19 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Journal ref: Phys. Rev. D 102, 084045 (2020)

arXiv:2006.09597 [pdf, other]

Cross-Correlated Attention Networks for Person Re-Identification

Authors: Jieming Zhou, Soumava Kumar Roy, Pengfei Fang, Mehrtash Harandi, Lars Petersson

Abstract: Deep neural networks need to make robust inference in the presence of occlusion, background clutter, pose and viewpoint variations -- to name a few -- when the task of person re-identification is considered. Attention mechanisms have recently proven to be successful in handling the aforementioned challenges to some degree. However previous designs fail to capture inherent inter-dependencies betwee… ▽ More Deep neural networks need to make robust inference in the presence of occlusion, background clutter, pose and viewpoint variations -- to name a few -- when the task of person re-identification is considered. Attention mechanisms have recently proven to be successful in handling the aforementioned challenges to some degree. However previous designs fail to capture inherent inter-dependencies between the attended features; leading to restricted interactions between the attention blocks. In this paper, we propose a new attention module called Cross-Correlated Attention (CCA); which aims to overcome such limitations by maximizing the information gain between different attended regions. Moreover, we also propose a novel deep network that makes use of different attention mechanisms to learn robust and discriminative representations of person images. The resulting model is called the Cross-Correlated Attention Network (CCAN). Extensive experiments demonstrate that the CCAN comfortably outperforms current state-of-the-art algorithms by a tangible margin. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Comments: Accepted by Image and Vision Computing

Journal ref: Image and Vision Computing, Vol. 100, 2020, p. 103931

arXiv:2005.07231 [pdf, other]

doi 10.1093/mnras/staa3346

Prospects of probing dark energy with eLISA: Standard versus null diagnostics

Authors: Pratyusava Baral, Soumendra Kishore Roy, Supratik Pal

Abstract: Gravitational waves from supermassive black hole binary mergers along with an electromagnetic counterpart have the potential to shed `light' on the nature of dark energy in the intermediate redshift regime. Accurate measurement of dark energy parameters at intermediate redshift is extremely essential to improve our understanding of dark energy, and to possibly resolve a couple of tensions involvin… ▽ More Gravitational waves from supermassive black hole binary mergers along with an electromagnetic counterpart have the potential to shed `light' on the nature of dark energy in the intermediate redshift regime. Accurate measurement of dark energy parameters at intermediate redshift is extremely essential to improve our understanding of dark energy, and to possibly resolve a couple of tensions involving cosmological parameters. We present a Fisher matrix forecast analysis in the context of eLISA to predict the errors for three different cases: the non-interacting dark energy with constant and evolving equation of state (EoS), and the interacting dark sectors with a generalized parametrization. In all three cases, we perform the analysis for two separate formalisms, namely, the standard EoS formalism and the \textit{Om} parametrization which is a model-independent null diagnostic for a wide range of fiducial values in both phantom and non-phantom regions, to make a comparative analysis between the prospects of these two diagnostics in eLISA. Our analysis reveals that it is wiser and more effective to probe the null diagnostic instead of the standard EoS parameters for any possible signature of dark energy at intermediate redshift measurements like eLISA. △ Less

Submitted 14 December, 2020; v1 submitted 14 May, 2020; originally announced May 2020.

Journal ref: Monthly Notices of the Royal Astronomical Society, Volume 500, Issue 3, January 2021

arXiv:1909.11015 [pdf, other]

diffGrad: An Optimization Method for Convolutional Neural Networks

Authors: Shiv Ram Dubey, Soumendu Chakraborty, Swalpa Kumar Roy, Snehasis Mukherjee, Satish Kumar Singh, Bidyut Baran Chaudhuri

Abstract: Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal sized steps for all parameters, irrespective of gradient behavior. Hence, an efficient way of deep network optimization is to make adap… ▽ More Stochastic Gradient Decent (SGD) is one of the core techniques behind the success of deep neural networks. The gradient provides information on the direction in which a function has the steepest rate of change. The main problem with basic SGD is to change by equal sized steps for all parameters, irrespective of gradient behavior. Hence, an efficient way of deep network optimization is to make adaptive step sizes for each parameter. Recently, several attempts have been made to improve gradient descent methods such as AdaGrad, AdaDelta, RMSProp and Adam. These methods rely on the square roots of exponential moving averages of squared past gradients. Thus, these methods do not take advantage of local change in gradients. In this paper, a novel optimizer is proposed based on the difference between the present and the immediate past gradient (i.e., diffGrad). In the proposed diffGrad optimization technique, the step size is adjusted for each parameter in such a way that it should have a larger step size for faster gradient changing parameters and a lower step size for lower gradient changing parameters. The convergence analysis is done using the regret bound approach of online learning framework. Rigorous analysis is made in this paper over three synthetic complex non-convex functions. The image categorization experiments are also conducted over the CIFAR10 and CIFAR100 datasets to observe the performance of diffGrad with respect to the state-of-the-art optimizers such as SGDM, AdaGrad, AdaDelta, RMSProp, AMSGrad, and Adam. The residual unit (ResNet) based Convolutional Neural Networks (CNN) architecture is used in the experiments. The experiments show that diffGrad outperforms other optimizers. Also, we show that diffGrad performs uniformly well for training CNN using different activation functions. The source code is made publicly available at https://github.com/shivram1987/diffGrad. △ Less

Submitted 26 November, 2021; v1 submitted 12 September, 2019; originally announced September 2019.

Journal ref: IEEE Transactions on Neural Networks and Learning Systems, 2020

arXiv:1907.13480 [pdf, ps, other]

doi 10.1103/PhysRevD.100.063008

Relativistic Feynman-Metropolis-Teller Equation of State for White Dwarfs in presence of Magnetic Field

Authors: Sujan Kumar Roy, Somnath Mukhopadhyay, Joydev Lahiri, D. N. Basu

Abstract: The relativistic Feynman-Metropolis-Teller treatment of compressed atoms is extended to treat magnetized matter. Each atomic configuration is confined by a Wigner-Seitz cell and is characterized by a positive electron Fermi energy which varies insignificantly with the magnetic field. In the relativistic treatment the limiting configuration is reached when the Wigner-Seitz cell radius equals the ra… ▽ More The relativistic Feynman-Metropolis-Teller treatment of compressed atoms is extended to treat magnetized matter. Each atomic configuration is confined by a Wigner-Seitz cell and is characterized by a positive electron Fermi energy which varies insignificantly with the magnetic field. In the relativistic treatment the limiting configuration is reached when the Wigner-Seitz cell radius equals the radius of the nucleus with a maximum value of the electron Fermi energy which can not be attained in presence of magnetic field due to the effect of Landau quantization of electrons within the Wigner-Seitz cell. This treatment is implemented to develop the Equation of State for magnetized White Dwarf stars in presence of Coulomb screening. The mass-radius relations for magnetized White Dwarfs are obtained by solving the general relativistic hydrostatic equilibrium equations using Schwarzschild metric description suitable for non rotating and slowly rotating stars. The explicit effects of the magnetic energy density and pressure contributed by a density-dependent magnetic field are included to find the stable configurations of magnetized Super-Chandrasekhar White Dwarfs. △ Less

Submitted 30 July, 2019; originally announced July 2019.

Comments: 11 pages including 12 figures and 1 table. arXiv admin note: text overlap with arXiv:1507.05439

Journal ref: Phys. Rev. D 100, 063008 (2019)

Showing 1–50 of 74 results for author: Roy, S K