Skip to main content

Showing 1–50 of 258 results for author: Paul, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10046  [pdf, ps, other

    cs.CV

    Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

    Authors: Bingda Tang, Boyang Zheng, Xichen Pan, Sayak Paul, Saining Xie

    Abstract: This paper does not describe a new method; instead, it provides a thorough exploration of an important yet understudied design space related to recent advances in text-to-image synthesis -- specifically, the deep fusion of large language models (LLMs) and diffusion transformers (DiTs) for multi-modal generation. Previous studies mainly focused on overall system performance rather than detailed com… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2504.16080  [pdf, other

    cs.CV

    From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning

    Authors: Le Zhuo, Liangbing Zhao, Sayak Paul, Yue Liao, Renrui Zhang, Yi Xin, Peng Gao, Mohamed Elhoseiny, Hongsheng Li

    Abstract: Recent text-to-image diffusion models achieve impressive visual quality through extensive scaling of training data and model parameters, yet they often struggle with complex scenes and fine-grained details. Inspired by the self-reflection capabilities emergent in large language models, we propose ReflectionFlow, an inference-time framework enabling diffusion models to iteratively reflect upon and… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: All code, checkpoints, and datasets are available at \url{https://diffusion-cot.github.io/reflection2perfection}

  3. arXiv:2504.12711  [pdf, other

    cs.CV cs.AI eess.IV

    NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

    Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More

    Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

  4. arXiv:2504.10234  [pdf, other

    cs.FL

    Resolving Nondeterminism by Chance

    Authors: Soumyajit Paul, David Purser, Sven Schewe, Qiyi Tang, Patrick Totzke, Di-De Yen

    Abstract: History-deterministic automata are those in which nondeterministic choices can be correctly resolved stepwise: there is a strategy to select a continuation of a run given the next input letter so that if the overall input word admits some accepting run, then the constructed run is also accepting. Motivated by checking qualitative properties in probabilistic verification, we consider the setting… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

  5. arXiv:2504.05537  [pdf, other

    cs.CV cs.AI

    Towards Efficient Real-Time Video Motion Transfer via Generative Time Series Modeling

    Authors: Tasmiah Haque, Md. Asif Bin Syed, Byungheon Jeong, Xue Bai, Sumit Mohan, Somdyuti Paul, Imtiaz Ahmed, Srinjoy Das

    Abstract: We propose a deep learning framework designed to significantly optimize bandwidth for motion-transfer-enabled video applications, including video conferencing, virtual reality interactions, health monitoring systems, and vision-based real-time anomaly detection. To capture complex motion effectively, we utilize the First Order Motion Model (FOMM), which encodes dynamic objects by detecting keypoin… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  6. arXiv:2504.00428  [pdf, other

    cs.CR cs.AI

    LLM-Assisted Proactive Threat Intelligence for Automated Reasoning

    Authors: Shuva Paul, Farhad Alemi, Richard Macwan

    Abstract: Successful defense against dynamically evolving cyber threats requires advanced and sophisticated techniques. This research presents a novel approach to enhance real-time cybersecurity threat detection and response by integrating large language models (LLMs) and Retrieval-Augmented Generation (RAG) systems with continuous threat intelligence feeds. Leveraging recent advancements in LLMs, specifica… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 10 Pages, 1 Figure

  7. arXiv:2503.21906  [pdf, other

    cs.LO cs.FL cs.MA

    Monitoring Spatially Distributed Cyber-Physical Systems with Alternating Finite Automata

    Authors: Anand Balakrishnan, Sheryl Paul, Simone Silvetti, Laura Nenzi, Jyotirmoy V. Deshmukh

    Abstract: Modern cyber-physical systems (CPS) can consist of various networked components and agents interacting and communicating with each other. In the context of spatially distributed CPS, these connections can be dynamically dependent on the spatial configuration of the various components and agents. In these settings, robust monitoring of the distributed components is vital to ensuring complex behavio… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Accepted to HSCC 2025

  8. Exploring the Efficacy of Partial Denoising Using Bit Plane Slicing for Enhanced Fracture Identification: A Comparative Study of Deep Learning-Based Approaches and Handcrafted Feature Extraction Techniques

    Authors: Snigdha Paul, Sambit Mallick, Anindya Sen

    Abstract: Computer vision has transformed medical diagnosis, treatment, and research through advanced image processing and machine learning techniques. Fracture classification, a critical area in healthcare, has greatly benefited from these advancements, yet accurate detection is challenged by complex patterns and image noise. Bit plane slicing enhances medical images by reducing noise interference and extr… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  9. arXiv:2503.09641  [pdf, other

    cs.GR

    SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation

    Authors: Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Enze Xie, Song Han

    Abstract: This paper presents SANA-Sprint, an efficient diffusion model for ultra-fast text-to-image (T2I) generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4. We introduce three key innovations: (1) We propose a training-free approach that transforms a pre-trained flow-matching model for continuous-t… ▽ More

    Submitted 23 March, 2025; v1 submitted 12 March, 2025; originally announced March 2025.

    Comments: 22 pages, 11 figures, 8 tables, In submission

  10. arXiv:2502.13933  [pdf, other

    cs.GT

    Simplifying imperfect recall games

    Authors: Hugo Gimbert, Soumyajit Paul, B. Srivathsan

    Abstract: In games with imperfect recall, players may forget the sequence of decisions they made in the past. When players also forget whether they have already encountered their current decision point, they are said to be absent-minded. Solving one-player imperfect recall games is known to be NP-hard, even when the players are not absent-minded. This motivates the search for polynomial-time solvable subcla… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

  11. arXiv:2502.11927  [pdf, other

    cs.LG

    Continual Learning Should Move Beyond Incremental Classification

    Authors: Rupert Mitchell, Antonio Alliegro, Raffaello Camoriano, Dustin Carrión-Ojeda, Antonio Carta, Georgia Chalvatzaki, Nikhil Churamani, Carlo D'Eramo, Samin Hamidi, Robin Hesse, Fabian Hinder, Roshni Ramanna Kamath, Vincenzo Lomonaco, Subarnaduti Paul, Francesca Pistilli, Tinne Tuytelaars, Gido M van de Ven, Kristian Kersting, Simone Schaub-Meyer, Martin Mundt

    Abstract: Continual learning (CL) is the sub-field of machine learning concerned with accumulating knowledge in dynamic environments. So far, CL research has mainly focused on incremental classification tasks, where models learn to classify new categories while retaining knowledge of previously learned ones. Here, we argue that maintaining such a focus limits both theoretical development and practical appli… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  12. arXiv:2502.03038  [pdf, other

    cs.AI cs.CY cs.LG

    The Cake that is Intelligence and Who Gets to Bake it: An AI Analogy and its Implications for Participation

    Authors: Martin Mundt, Anaelia Ovalle, Felix Friedrich, A Pranav, Subarnaduti Paul, Manuel Brack, Kristian Kersting, William Agnew

    Abstract: In a widely popular analogy by Turing Award Laureate Yann LeCun, machine intelligence has been compared to cake - where unsupervised learning forms the base, supervised learning adds the icing, and reinforcement learning is the cherry on top. We expand this 'cake that is intelligence' analogy from a simple structural metaphor to the full life-cycle of AI systems, extending it to sourcing of ingred… ▽ More

    Submitted 6 February, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

  13. arXiv:2502.00382  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Generative Nested Transformers with Decode Time Scaling

    Authors: Sahil Goyal, Debapriya Tula, Gagan Jain, Pradeep Shenoy, Prateek Jain, Sujoy Paul

    Abstract: Recent advances in visual generation have made significant strides in producing content of exceptional quality. However, most methods suffer from a fundamental problem - a bottleneck of inference computational efficiency. Most of these algorithms involve multiple passes over a transformer model to generate tokens or denoise inputs. However, the model size is kept consistent throughout all iteratio… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

  14. arXiv:2501.12682  [pdf, other

    eess.AS cs.SD

    EmoFormer: A Text-Independent Speech Emotion Recognition using a Hybrid Transformer-CNN model

    Authors: Rashedul Hasan, Meher Nigar, Nursadul Mamun, Sayan Paul

    Abstract: Speech Emotion Recognition is a crucial area of research in human-computer interaction. While significant work has been done in this field, many state-of-the-art networks struggle to accurately recognize emotions in speech when the data is both speech and speaker-independent. To address this limitation, this study proposes, EmoFormer, a hybrid model combining CNNs (CNNs) with Transformer encoders… ▽ More

    Submitted 22 January, 2025; originally announced January 2025.

  15. arXiv:2501.07039  [pdf, other

    cs.CV

    IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare

    Authors: Subrata Kumer Paul, Abu Saleh Musa Miah, Rakhi Rani Paul, Md. Ekramul Hamid, Jungpil Shin, Md Abdur Rahim

    Abstract: The Internet of Things (IoT) and mobile technology have significantly transformed healthcare by enabling real-time monitoring and diagnosis of patients. Recognizing medical-related human activities (MRHA) is pivotal for healthcare systems, particularly for identifying actions that are critical to patient well-being. However, challenges such as high computational demands, low accuracy, and limited… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  16. arXiv:2412.18081  [pdf, other

    stat.ML cs.LG

    Heterogeneous transfer learning for high dimensional regression with feature mismatch

    Authors: Jae Ho Chang, Massimiliano Russo, Subhadeep Paul

    Abstract: We consider the problem of transferring knowledge from a source, or proxy, domain to a new target domain for learning a high-dimensional regression model with possibly different features. Recently, the statistical properties of homogeneous transfer learning have been investigated. However, most homogeneous transfer and multi-task learning methods assume that the target and proxy domains have the s… ▽ More

    Submitted 23 December, 2024; originally announced December 2024.

  17. arXiv:2412.14672  [pdf, other

    cs.CV cs.AI

    FiVL: A Framework for Improved Vision-Language Alignment through the Lens of Training, Evaluation and Explainability

    Authors: Estelle Aflalo, Gabriela Ben Melech Stan, Tiep Le, Man Luo, Shachar Rosenman, Sayak Paul, Shao-Yen Tseng, Vasudev Lal

    Abstract: Large Vision Language Models (LVLMs) have achieved significant progress in integrating visual and textual inputs for multimodal reasoning. However, a recurring challenge is ensuring these models utilize visual information as effectively as linguistic content when both modalities are necessary to formulate an accurate answer. We hypothesize that hallucinations arise due to the lack of effective vis… ▽ More

    Submitted 19 March, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  18. arXiv:2412.07485  [pdf, other

    cs.RO

    Performance Evaluation of ROS2-DDS middleware implementations facilitating Cooperative Driving in Autonomous Vehicle

    Authors: Sumit Paul, Danh Lephuoc, Manfred Hauswirth

    Abstract: In the autonomous vehicle and self-driving paradigm, cooperative perception or exchanging sensor information among vehicles over wireless communication has added a new dimension. Generally, an autonomous vehicle is a special type of robot that requires real-time, highly reliable sensor inputs due to functional safety. Autonomous vehicles are equipped with a considerable number of sensors to provid… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: Edge AI meets Swarm Intelligence Technical Workshop, September 18, 2024, Dubrovnik, Croatia

  19. arXiv:2412.03895  [pdf, other

    cs.CV cs.AI cs.LG

    A Noise is Worth Diffusion Guidance

    Authors: Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha, Kyong Hwan Jin, Seungryong Kim

    Abstract: Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: Project page: https://cvlab-kaist.github.io/NoiseRefine/

  20. arXiv:2412.01487  [pdf, other

    cs.AI

    FastRM: An efficient and automatic explainability framework for multimodal generative models

    Authors: Gabriela Ben-Melech Stan, Estelle Aflalo, Man Luo, Shachar Rosenman, Tiep Le, Sayak Paul, Shao-Yen Tseng, Vasudev Lal

    Abstract: Large Vision Language Models (LVLMs) have demonstrated remarkable reasoning capabilities over textual and visual inputs. However, these models remain prone to generating misinformation. Identifying and mitigating ungrounded responses is crucial for developing trustworthy AI. Traditional explainability methods such as gradient-based relevancy maps, offer insight into the decision process of models,… ▽ More

    Submitted 6 May, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

  21. arXiv:2411.18519  [pdf, other

    cs.RO cs.MA

    A Talent-infused Policy-gradient Approach to Efficient Co-Design of Morphology and Task Allocation Behavior of Multi-Robot Systems

    Authors: Prajit KrisshnaKumar, Steve Paul, Souma Chowdhury

    Abstract: Interesting and efficient collective behavior observed in multi-robot or swarm systems emerges from the individual behavior of the robots. The functional space of individual robot behaviors is in turn shaped or constrained by the robot's morphology or physical design. Thus the full potential of multi-robot systems can be realized by concurrently optimizing the morphology and behavior of individual… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: Presented in proceedings of the International Symposium on Distributed Autonomous Robotic Systems (DARS) 2024

  22. arXiv:2411.17826  [pdf, other

    cs.RO cs.LG stat.ML

    Rate-Informed Discovery via Bayesian Adaptive Multifidelity Sampling

    Authors: Aman Sinha, Payam Nikdel, Supratik Paul, Shimon Whiteson

    Abstract: Ensuring the safety of autonomous vehicles (AVs) requires both accurate estimation of their performance and efficient discovery of potential failure cases. This paper introduces Bayesian adaptive multifidelity sampling (BAMS), which leverages the power of adaptive Bayesian sampling to achieve efficient discovery while simultaneously estimating the rate of adverse events. BAMS prioritizes explorati… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: Published at CoRL 2024: https://openreview.net/forum?id=bftFwjSJxk

  23. arXiv:2411.17421  [pdf, other

    cs.FL nlin.CG

    Temporally Non-Uniform Cellular Automata (t-NUCA): Reversibility and Cyclic behavior

    Authors: Subrata Paul, Sukanta Das

    Abstract: In this work, we propose a variant of non-uniform cellular automata, named as Temporally Non-Uniform Cellular Automata (t-NUCAs), which temporally use two rules, $f$ and $g$ in a sequence $\mathcal{R}$. To observe reversibility in t-NUCAs, we study their injectivity and surjectivity properties. Unlike classical CAs, some irreversible t-NUCAs show the behavior similar to reversible t-NUCAs. To stud… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  24. arXiv:2411.15966  [pdf, other

    cs.CV

    Gaussian Scenes: Pose-Free Sparse-View Scene Reconstruction using Depth-Enhanced Diffusion Priors

    Authors: Soumava Paul, Prakhar Kaushik, Alan Yuille

    Abstract: In this work, we introduce a generative approach for pose-free (without camera parameters) reconstruction of 360 scenes from a sparse set of 2D images. Pose-free scene reconstruction from incomplete, pose-free observations is usually regularized with depth estimation or 3D foundational priors. While recent advances have enabled sparse-view reconstruction of large complex scenes (with high degree o… ▽ More

    Submitted 5 April, 2025; v1 submitted 24 November, 2024; originally announced November 2024.

    Comments: Project page is available at https://gaussianscenes.github.io/

  25. arXiv:2411.14770  [pdf, other

    cs.RO

    Aim My Robot: Precision Local Navigation to Any Object

    Authors: Xiangyun Meng, Xuning Yang, Sanghun Jung, Fabio Ramos, Srid Sadhan Jujjavarapu, Sanjoy Paul, Dieter Fox

    Abstract: Existing navigation systems mostly consider "success" when the robot reaches within 1m radius to a goal. This precision is insufficient for emerging applications where the robot needs to be positioned precisely relative to an object for downstream tasks, such as docking, inspection, and manipulation. To this end, we design and implement Aim-My-Robot (AMR), a local navigation system that enables a… ▽ More

    Submitted 27 December, 2024; v1 submitted 22 November, 2024; originally announced November 2024.

  26. arXiv:2411.13141  [pdf, other

    cs.CC cs.DM cs.DS math.CO

    (Independent) Roman Domination Parameterized by Distance to Cluster

    Authors: Pradeesha Ashok, Gautam K. Das, Arti Pandey, Kaustav Paul, Subhabrata Paul

    Abstract: Given a graph $G=(V,E)$, a function $f:V\to \{0,1,2\}$ is said to be a \emph{Roman Dominating function} (RDF) if for every $v\in V$ with $f(v)=0$, there exists a vertex $u\in N(v)$ such that $f(u)=2$. A Roman Dominating function $f$ is said to be an \emph{Independent Roman Dominating function} (IRDF), if $V_1\cup V_2$ forms an independent set, where $V_i=\{v\in V~\vert~f(v)=i\}$, for… ▽ More

    Submitted 20 November, 2024; originally announced November 2024.

    Comments: arXiv admin note: text overlap with arXiv:2405.10556 by other authors

  27. Multi-agent Path Finding for Timed Tasks using Evolutionary Games

    Authors: Sheryl Paul, Anand Balakrishnan, Xin Qin, Jyotirmoy V. Deshmukh

    Abstract: Autonomous multi-agent systems such as hospital robots and package delivery drones often operate in highly uncertain environments and are expected to achieve complex temporal task objectives while ensuring safety. While learning-based methods such as reinforcement learning are popular methods to train single and multi-agent autonomous systems under user-specified and state-based reward functions,… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

  28. arXiv:2411.04796  [pdf, other

    cs.RO cs.AI cs.CV

    MPVO: Motion-Prior based Visual Odometry for PointGoal Navigation

    Authors: Sayan Paul, Ruddra dev Roychoudhury, Brojeshwar Bhowmick

    Abstract: Visual odometry (VO) is essential for enabling accurate point-goal navigation of embodied agents in indoor environments where GPS and compass sensors are unreliable and inaccurate. However, traditional VO methods face challenges in wide-baseline scenarios, where fast robot motions and low frames per second (FPS) during inference hinder their performance, leading to drift and catastrophic failures… ▽ More

    Submitted 7 November, 2024; originally announced November 2024.

    Comments: Accepted in 50SFM Workshop of the 18th European Conference on Computer Vision (ECCV) 2024

  29. arXiv:2411.02993  [pdf

    cs.DL

    Empowering Library Users: Creative Strategies for Engagement and Innovation

    Authors: Snehasish Paul, Shivali Chauhan, Atul Kumar Pal

    Abstract: This study investigated the integration of cutting-edge technologies and methodologies for creating dynamic, user-centered library environments. In creative strategies for engagement and innovation, library users must be empowered to undertake the new role of modernizing library services and enhancing user experiences. It also enhances the information management and user engagement. This can be at… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  30. A Comparative Study of Multiple Deep Learning Algorithms for Efficient Localization of Bone Joints in the Upper Limbs of Human Body

    Authors: Soumalya Bose, Soham Basu, Indranil Bera, Sambit Mallick, Snigdha Paul, Saumodip Das, Swarnendu Sil, Swarnava Ghosh, Anindya Sen

    Abstract: This paper addresses the medical imaging problem of joint detection in the upper limbs, viz. elbow, shoulder, wrist and finger joints. Localization of joints from X-Ray and Computerized Tomography (CT) scans is an essential step for the assessment of various bone-related medical conditions like Osteoarthritis, Rheumatoid Arthritis, and can even be used for automated bone fracture detection. Automa… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

    Journal ref: Advances in Intelligent Systems and Computing, vol 1439. Springer, Singapore (2023)

  31. arXiv:2410.19852  [pdf, other

    cs.LG cs.AI cs.GT cs.NE

    Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts

    Authors: Sheryl Paul, Jyotirmoy V. Deshmukh

    Abstract: Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the environment experiences drastic distribution shifts, the optimal policy obtained in the trained environment may be sub-optimal or may entirely fail in helping f… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: Pubblished in ECAI 2024

  32. arXiv:2410.19767  [pdf, other

    cs.IT cs.LG

    Learning Robust Representations for Communications over Interference-limited Channels

    Authors: Shubham Paul, Sudharsan Senthil, Preethi Seshadri, Nambi Seshadri, R David Koilpillai

    Abstract: In the context of cellular networks, users located at the periphery of cells are particularly vulnerable to substantial interference from neighbouring cells, which can be represented as a two-user interference channel. This study introduces two highly effective methodologies, namely TwinNet and SiameseNet, using autoencoders, tailored for the design of encoders and decoders for block transmission… ▽ More

    Submitted 13 October, 2024; originally announced October 2024.

    Comments: Submitted to WCNC 2025

  33. arXiv:2410.14219  [pdf, other

    cs.AI cs.LG cs.LO

    Formal Explanations for Neuro-Symbolic AI

    Authors: Sushmita Paul, Jinqiang Yu, Jip J. Dekker, Alexey Ignatiev, Peter J. Stuckey

    Abstract: Despite the practical success of Artificial Intelligence (AI), current neural AI algorithms face two significant issues. First, the decisions made by neural architectures are often prone to bias and brittleness. Second, when a chain of reasoning is required, neural systems often perform poorly. Neuro-symbolic artificial intelligence is a promising approach that tackles these (and other) weaknesses… ▽ More

    Submitted 18 October, 2024; originally announced October 2024.

  34. arXiv:2410.05800  [pdf, other

    cs.CV cs.AI

    Core Tokensets for Data-efficient Sequential Training of Transformers

    Authors: Subarnaduti Paul, Manuel Brack, Patrick Schramowski, Kristian Kersting, Martin Mundt

    Abstract: Deep networks are frequently tuned to novel tasks and continue learning from ongoing data streams. Such sequential training requires consolidation of new and past information, a challenge predominantly addressed by retaining the most important data points - formally known as coresets. Traditionally, these coresets consist of entire samples, such as images or sentences. However, recent transformer… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

  35. arXiv:2409.14198  [pdf, other

    eess.IV cs.CV

    A Sinkhorn Regularized Adversarial Network for Image Guided DEM Super-resolution using Frequency Selective Hybrid Graph Transformer

    Authors: Subhajit Paul, Ashutosh Gupta

    Abstract: Digital Elevation Model (DEM) is an essential aspect in the remote sensing (RS) domain to analyze various applications related to surface elevations. Here, we address the generation of high-resolution (HR) DEMs using HR multi-spectral (MX) satellite imagery as a guide by introducing a novel hybrid transformer model consisting of Densely connected Multi-Residual Block (DMRB) and multi-headed Freque… ▽ More

    Submitted 21 September, 2024; originally announced September 2024.

    Comments: 25 pages, 19 figures. arXiv admin note: substantial text overlap with arXiv:2311.16490

    Journal ref: International Conference on Pattern Recognition (ICPR), 2024

  36. arXiv:2409.13977  [pdf, other

    cs.CV

    Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data

    Authors: Sneha Paul, Zachary Patterson, Nizar Bouguila

    Abstract: Semi-supervised learning (SSL) has shown its effectiveness in learning effective 3D representation from a small amount of labelled data while utilizing large unlabelled data. Traditional semi-supervised approaches rely on the fundamental concept of predicting pseudo-labels for unlabelled data and incorporating them into the learning process. However, we identify that the existing methods do not fu… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted at the European Conference on Computer Vision, ECCV 2024

  37. arXiv:2409.02056  [pdf, other

    cs.CV

    F2former: When Fractional Fourier Meets Deep Wiener Deconvolution and Selective Frequency Transformer for Image Deblurring

    Authors: Subhajit Paul, Sahil Kumawat, Ashutosh Gupta, Deepak Mishra

    Abstract: Recent progress in image deblurring techniques focuses mainly on operating in both frequency and spatial domains using the Fourier transform (FT) properties. However, their performance is limited due to the dependency of FT on stationary signals and its lack of capability to extract spatial-frequency properties. In this paper, we propose a novel approach based on the Fractional Fourier Transform (… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 20 pages, 21 figures

  38. arXiv:2409.01129  [pdf, other

    cs.LG cs.IT

    Learning Robust Representations for Communications over Noisy Channels

    Authors: Sudharsan Senthil, Shubham Paul, Nambi Seshadri, R. David Koilpillai

    Abstract: We explore the use of FCNNs (Fully Connected Neural Networks) for designing end-to-end communication systems without taking any inspiration from existing classical communications models or error control coding. This work relies solely on the tools of information theory and machine learning. We investigate the impact of using various cost functions based on mutual information and pairwise distances… ▽ More

    Submitted 7 September, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Submitted to WCNC 2025 for review

  39. arXiv:2408.13467  [pdf, other

    cs.LG cs.AI cs.DC

    LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs

    Authors: Chansung Park, Juyong Jiang, Fan Wang, Sayak Paul, Jing Tang

    Abstract: The widespread adoption of cloud-based proprietary large language models (LLMs) has introduced significant challenges, including operational dependencies, privacy concerns, and the necessity of continuous internet connectivity. In this work, we introduce an LLMOps pipeline, "LlamaDuo", for the seamless migration of knowledge and abilities from service-oriented LLMs to smaller, locally manageable m… ▽ More

    Submitted 28 August, 2024; v1 submitted 24 August, 2024; originally announced August 2024.

    Comments: 28 pages, 18 figures, 6 tables

  40. arXiv:2408.06459  [pdf, other

    eess.IV cs.CV

    InfLocNet: Enhanced Lung Infection Localization and Disease Detection from Chest X-Ray Images Using Lightweight Deep Learning

    Authors: Md. Asiful Islam Miah, Shourin Paul, Sunanda Das, M. M. A. Hashem

    Abstract: In recent years, the integration of deep learning techniques into medical imaging has revolutionized the diagnosis and treatment of lung diseases, particularly in the context of COVID-19 and pneumonia. This paper presents a novel, lightweight deep learning based segmentation-classification network designed to enhance the detection and localization of lung infections using chest X-ray images. By le… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  41. Process-constrained batch Bayesian approaches for yield optimization in multi-reactor systems

    Authors: Markus Grimm, Sébastien Paul, Pierre Chainais

    Abstract: The optimization of yields in multi-reactor systems, which are advanced tools in heterogeneous catalysis research, presents a significant challenge due to hierarchical technical constraints. To this respect, this work introduces a novel approach called process-constrained batch Bayesian optimization via Thompson sampling (pc-BO-TS) and its generalized hierarchical extension (hpc-BO-TS). This metho… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Journal ref: Volume 189, October 2024, 108779

  42. arXiv:2407.19985  [pdf, other

    cs.CV cs.AI cs.LG

    Mixture of Nested Experts: Adaptive Processing of Visual Tokens

    Authors: Gagan Jain, Nidhi Hegde, Aditya Kusupati, Arsha Nagrani, Shyamal Buch, Prateek Jain, Anurag Arnab, Sujoy Paul

    Abstract: The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate… ▽ More

    Submitted 30 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  43. arXiv:2407.13933  [pdf, other

    cs.CV

    Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence

    Authors: Zahidul Islam, Sujoy Paul, Mrigank Rochan

    Abstract: With the exponential growth of video content, the need for automated video highlight detection to extract key moments or highlights from lengthy videos has become increasingly pressing. This technology has the potential to enhance user experiences by allowing quick access to relevant content across diverse domains. Existing methods typically rely either on expensive manually labeled frame-level an… ▽ More

    Submitted 14 May, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

  44. arXiv:2407.12753  [pdf, other

    cs.CV cs.AI cs.LG

    LookupViT: Compressing visual information to a limited number of tokens

    Authors: Rajat Koner, Gagan Jain, Prateek Jain, Volker Tresp, Sujoy Paul

    Abstract: Vision Transformers (ViT) have emerged as the de-facto choice for numerous industry grade vision solutions. But their inference cost can be prohibitive for many settings, as they compute self-attention in each layer which suffers from quadratic computational complexity in the number of tokens. On the other hand, spatial information in images and spatio-temporal information in videos is usually spa… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  45. arXiv:2407.12113  [pdf, other

    cs.LG cs.AI cs.MA

    A Graph-based Adversarial Imitation Learning Framework for Reliable & Realtime Fleet Scheduling in Urban Air Mobility

    Authors: Prithvi Poddar, Steve Paul, Souma Chowdhury

    Abstract: The advent of Urban Air Mobility (UAM) presents the scope for a transformative shift in the domain of urban transportation. However, its widespread adoption and economic viability depends in part on the ability to optimally schedule the fleet of aircraft across vertiports in a UAM network, under uncertainties attributed to airspace congestion, changing weather conditions, and varying demands. This… ▽ More

    Submitted 5 September, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Presented at the AIAA Aviation Forum 2024

  46. arXiv:2407.06096   

    cs.CV

    Muzzle-Based Cattle Identification System Using Artificial Intelligence (AI)

    Authors: Hasan Zohirul Islam, Safayet Khan, Sanjib Kumar Paul, Sheikh Imtiaz Rahi, Fahim Hossain Sifat, Md. Mahadi Hasan Sany, Md. Shahjahan Ali Sarker, Tareq Anam, Ismail Hossain Polas

    Abstract: Absence of tamper-proof cattle identification technology was a significant problem preventing insurance companies from providing livestock insurance. This lack of technology had devastating financial consequences for marginal farmers as they did not have the opportunity to claim compensation for any unexpected events such as the accidental death of cattle in Bangladesh. Using machine learning and… ▽ More

    Submitted 9 October, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: In claimed novel augmentation techniques, there are some mistakes in equations that convey wrong result, which should not be

  47. arXiv:2407.05399  [pdf, other

    cs.CL cs.AI cs.LG

    IL-TUR: Benchmark for Indian Legal Text Understanding and Reasoning

    Authors: Abhinav Joshi, Shounak Paul, Akshat Sharma, Pawan Goyal, Saptarshi Ghosh, Ashutosh Modi

    Abstract: Legal systems worldwide are inundated with exponential growth in cases and documents. There is an imminent need to develop NLP and ML techniques for automatically processing and understanding legal documents to streamline the legal system. However, evaluating and comparing various NLP models designed specifically for the legal domain is challenging. This paper addresses this challenge by proposing… ▽ More

    Submitted 26 November, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted at ACL 2024 Main Conference; 40 Pages (9 Pages + References + Appendix)

  48. arXiv:2407.04589  [pdf, other

    cs.LG

    Remembering Everything Makes You Vulnerable: A Limelight on Machine Unlearning for Personalized Healthcare Sector

    Authors: Ahan Chatterjee, Sai Anirudh Aryasomayajula, Rajat Chaudhari, Subhajit Paul, Vishwa Mohan Singh

    Abstract: As the prevalence of data-driven technologies in healthcare continues to rise, concerns regarding data privacy and security become increasingly paramount. This thesis aims to address the vulnerability of personalized healthcare models, particularly in the context of ECG monitoring, to adversarial attacks that compromise patient privacy. We propose an approach termed "Machine Unlearning" to mitigat… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 15 Pages, Exploring unlearning techniques on ECG Classifier

  49. arXiv:2406.16612  [pdf, other

    cs.RO cs.MA

    Towards Physically Talented Aerial Robots with Tactically Smart Swarm Behavior thereof: An Efficient Co-design Approach

    Authors: Prajit KrisshnaKumar, Steve Paul, Hemanth Manjunatha, Mary Corra, Ehsan Esfahani, Souma Chowdhury

    Abstract: The collective performance or capacity of collaborative autonomous systems such as a swarm of robots is jointly influenced by the morphology and the behavior of individual systems in that collective. In that context, this paper explores how morphology impacts the learned tactical behavior of unmanned aerial/ground robots performing reconnaissance and search & rescue. This is achieved by presenting… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted for presentation in proceedings of ASME IDETC-CIE 2024

  50. arXiv:2406.10328  [pdf, other

    cs.CV cs.CL cs.LG

    From Pixels to Prose: A Large Dataset of Dense Image Captions

    Authors: Vasu Singla, Kaiyu Yue, Sukriti Paul, Reza Shirkavand, Mayuka Jayawardhana, Alireza Ganjdanesh, Heng Huang, Abhinav Bhatele, Gowthami Somepalli, Tom Goldstein

    Abstract: Training large vision-language models requires extensive, high-quality image-text pairs. Existing web-scraped datasets, however, are noisy and lack detailed image descriptions. To bridge this gap, we introduce PixelProse, a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models for detailed and accurate descriptions. To ensure d… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: pixelprose 16M dataset