Search | arXiv e-print repository

Dissipation in fermionic two-body continuous-time quantum walk under the steepest entropy ascent formalism

Authors: Rohit Kishan Ray, R. Srikanth, Sonjoy Majumder

Abstract: Quantum walks play a crucial role in quantum algorithms and computational problems. Many-body quantum walks can reveal and exploit quantum correlations that are unavailable for single-walker cases. Studying quantum walks under noise and dissipation, particularly in multi-walker systems, has significant implications. In this context, we use a thermodynamically consistent formalism of dissipation mo… ▽ More Quantum walks play a crucial role in quantum algorithms and computational problems. Many-body quantum walks can reveal and exploit quantum correlations that are unavailable for single-walker cases. Studying quantum walks under noise and dissipation, particularly in multi-walker systems, has significant implications. In this context, we use a thermodynamically consistent formalism of dissipation modeling, namely the steepest entropy ascent (SEA) formalism. We analyze two spinless fermionic continuous-time walkers on a 1D graph with tunable Hubbard and extended Hubbard-like interactions. By contrasting SEA-driven dynamics with unitary evolution, we systematically investigate how interaction strengths modulate thermalization and entropy production. Our findings highlight the relevance of SEA formalism in modeling nonlinear dissipation in many-body quantum systems and its implications for quantum thermalization. △ Less

Submitted 30 January, 2025; originally announced January 2025.

Comments: 13 pages, 11 figures

arXiv:2501.18229 [pdf, other]

GPD: Guided Polynomial Diffusion for Motion Planning

Authors: Ajit Srikanth, Parth Mahanjan, Kallol Saha, Vishal Mandadi, Pranjal Paul, Pawan Wadhwani, Brojeshwar Bhowmick, Arun Singh, Madhava Krishna

Abstract: Diffusion-based motion planners are becoming popular due to their well-established performance improvements, stemming from sample diversity and the ease of incorporating new constraints directly during inference. However, a primary limitation of the diffusion process is the requirement for a substantial number of denoising steps, especially when the denoising process is coupled with gradient-based… ▽ More Diffusion-based motion planners are becoming popular due to their well-established performance improvements, stemming from sample diversity and the ease of incorporating new constraints directly during inference. However, a primary limitation of the diffusion process is the requirement for a substantial number of denoising steps, especially when the denoising process is coupled with gradient-based guidance. In this paper, we introduce, diffusion in the parametric space of trajectories, where the parameters are represented as Bernstein coefficients. We show that this representation greatly improves the effectiveness of the cost function guidance and the inference speed. We also introduce a novel stitching algorithm that leverages the diversity in diffusion-generated trajectories to produce collision-free trajectories with just a single cost function-guided model. We demonstrate that our approaches outperform current SOTA diffusion-based motion planners for manipulators and provide an ablation study on key components. △ Less

Submitted 30 January, 2025; originally announced January 2025.

arXiv:2501.17361 [pdf, other]

The M-factor: A Novel Metric for Evaluating Neural Architecture Search in Resource-Constrained Environments

Authors: Srikanth Thudumu, Hy Nguyen, Hung Du, Nhat Duong, Zafaryab Rasool, Rena Logothetis, Scott Barnett, Rajesh Vasa, Kon Mouzakis

Abstract: Neural Architecture Search (NAS) aims to automate the design of deep neural networks. However, existing NAS techniques often focus on maximising accuracy, neglecting model efficiency. This limitation restricts their use in resource-constrained environments like mobile devices and edge computing systems. Moreover, current evaluation metrics prioritise performance over efficiency, lacking a balanced… ▽ More Neural Architecture Search (NAS) aims to automate the design of deep neural networks. However, existing NAS techniques often focus on maximising accuracy, neglecting model efficiency. This limitation restricts their use in resource-constrained environments like mobile devices and edge computing systems. Moreover, current evaluation metrics prioritise performance over efficiency, lacking a balanced approach for assessing architectures suitable for constrained scenarios. To address these challenges, this paper introduces the M-factor, a novel metric combining model accuracy and size. Four diverse NAS techniques are compared: Policy-Based Reinforcement Learning, Regularised Evolution, Tree-structured Parzen Estimator (TPE), and Multi-trial Random Search. These techniques represent different NAS paradigms, providing a comprehensive evaluation of the M-factor. The study analyses ResNet configurations on the CIFAR-10 dataset, with a search space of 19,683 configurations. Experiments reveal that Policy-Based Reinforcement Learning and Regularised Evolution achieved M-factor values of 0.84 and 0.82, respectively, while Multi-trial Random Search attained 0.75, and TPE reached 0.67. Policy-Based Reinforcement Learning exhibited performance changes after 39 trials, while Regularised Evolution optimised within 20 trials. The research investigates the optimisation dynamics and trade-offs between accuracy and model size for each strategy. Findings indicate that, in some cases, random search performed comparably to more complex algorithms when assessed using the M-factor. These results highlight how the M-factor addresses the limitations of existing metrics by guiding NAS towards balanced architectures, offering valuable insights for selecting strategies in scenarios requiring both performance and efficiency. △ Less

Submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.16753 [pdf, other]

Overcoming Semantic Dilution in Transformer-Based Next Frame Prediction

Authors: Hy Nguyen, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis

Abstract: Next-frame prediction in videos is crucial for applications such as autonomous driving, object tracking, and motion prediction. The primary challenge in next-frame prediction lies in effectively capturing and processing both spatial and temporal information from previous video sequences. The transformer architecture, known for its prowess in handling sequence data, has made remarkable progress in… ▽ More Next-frame prediction in videos is crucial for applications such as autonomous driving, object tracking, and motion prediction. The primary challenge in next-frame prediction lies in effectively capturing and processing both spatial and temporal information from previous video sequences. The transformer architecture, known for its prowess in handling sequence data, has made remarkable progress in this domain. However, transformer-based next-frame prediction models face notable issues: (a) The multi-head self-attention (MHSA) mechanism requires the input embedding to be split into $N$ chunks, where $N$ is the number of heads. Each segment captures only a fraction of the original embeddings information, which distorts the representation of the embedding in the latent space, resulting in a semantic dilution problem; (b) These models predict the embeddings of the next frames rather than the frames themselves, but the loss function based on the errors of the reconstructed frames, not the predicted embeddings -- this creates a discrepancy between the training objective and the model output. We propose a Semantic Concentration Multi-Head Self-Attention (SCMHSA) architecture, which effectively mitigates semantic dilution in transformer-based next-frame prediction. Additionally, we introduce a loss function that optimizes SCMHSA in the latent space, aligning the training objective more closely with the model output. Our method demonstrates superior performance compared to the original transformer-based predictors. △ Less

Submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.15695 [pdf, other]

Contextual Knowledge Sharing in Multi-Agent Reinforcement Learning with Decentralized Communication and Coordination

Authors: Hung Du, Srikanth Thudumu, Hy Nguyen, Rajesh Vasa, Kon Mouzakis

Abstract: Decentralized Multi-Agent Reinforcement Learning (Dec-MARL) has emerged as a pivotal approach for addressing complex tasks in dynamic environments. Existing Multi-Agent Reinforcement Learning (MARL) methodologies typically assume a shared objective among agents and rely on centralized control. However, many real-world scenarios feature agents with individual goals and limited observability of othe… ▽ More Decentralized Multi-Agent Reinforcement Learning (Dec-MARL) has emerged as a pivotal approach for addressing complex tasks in dynamic environments. Existing Multi-Agent Reinforcement Learning (MARL) methodologies typically assume a shared objective among agents and rely on centralized control. However, many real-world scenarios feature agents with individual goals and limited observability of other agents, complicating coordination and hindering adaptability. Existing Dec-MARL strategies prioritize either communication or coordination, lacking an integrated approach that leverages both. This paper presents a novel Dec-MARL framework that integrates peer-to-peer communication and coordination, incorporating goal-awareness and time-awareness into the agents' knowledge-sharing processes. Our framework equips agents with the ability to (i) share contextually relevant knowledge to assist other agents, and (ii) reason based on information acquired from multiple agents, while considering their own goals and the temporal context of prior knowledge. We evaluate our approach through several complex multi-agent tasks in environments with dynamically appearing obstacles. Our work demonstrates that incorporating goal-aware and time-aware knowledge sharing significantly enhances overall performance. △ Less

Submitted 26 January, 2025; originally announced January 2025.

arXiv:2501.14000 [pdf, other]

Local Control Networks (LCNs): Optimizing Flexibility in Neural Network Data Pattern Capture

Authors: Hy Nguyen, Duy Khoa Pham, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis

Abstract: The widespread use of Multi-layer perceptrons (MLPs) often relies on a fixed activation function (e.g., ReLU, Sigmoid, Tanh) for all nodes within the hidden layers. While effective in many scenarios, this uniformity may limit the networks ability to capture complex data patterns. We argue that employing the same activation function at every node is suboptimal and propose leveraging different activ… ▽ More The widespread use of Multi-layer perceptrons (MLPs) often relies on a fixed activation function (e.g., ReLU, Sigmoid, Tanh) for all nodes within the hidden layers. While effective in many scenarios, this uniformity may limit the networks ability to capture complex data patterns. We argue that employing the same activation function at every node is suboptimal and propose leveraging different activation functions at each node to increase flexibility and adaptability. To achieve this, we introduce Local Control Networks (LCNs), which leverage B-spline functions to enable distinct activation curves at each node. Our mathematical analysis demonstrates the properties and benefits of LCNs over conventional MLPs. In addition, we demonstrate that more complex architectures, such as Kolmogorov-Arnold Networks (KANs), are unnecessary in certain scenarios, and LCNs can be a more efficient alternative. Empirical experiments on various benchmarks and datasets validate our theoretical findings. In computer vision tasks, LCNs achieve marginal improvements over MLPs and outperform KANs by approximately 5\%, while also being more computationally efficient than KANs. In basic machine learning tasks, LCNs show a 1\% improvement over MLPs and a 0.6\% improvement over KANs. For symbolic formula representation tasks, LCNs perform on par with KANs, with both architectures outperforming MLPs. Our findings suggest that diverse activations at the node level can lead to improved performance and efficiency. △ Less

Submitted 25 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.13994 [pdf, other]

CSAOT: Cooperative Multi-Agent System for Active Object Tracking

Authors: Hy Nguyen, Bao Pham, Hung Du, Srikanth Thudumu, Rajesh Vasa, Kon Mouzakis

Abstract: Object Tracking is essential for many computer vision applications, such as autonomous navigation, surveillance, and robotics. Unlike Passive Object Tracking (POT), which relies on static camera viewpoints to detect and track objects across consecutive frames, Active Object Tracking (AOT) requires a controller agent to actively adjust its viewpoint to maintain visual contact with a moving target i… ▽ More Object Tracking is essential for many computer vision applications, such as autonomous navigation, surveillance, and robotics. Unlike Passive Object Tracking (POT), which relies on static camera viewpoints to detect and track objects across consecutive frames, Active Object Tracking (AOT) requires a controller agent to actively adjust its viewpoint to maintain visual contact with a moving target in complex environments. Existing AOT solutions are predominantly single-agent-based, which struggle in dynamic and complex scenarios due to limited information gathering and processing capabilities, often resulting in suboptimal decision-making. Alleviating these limitations necessitates the development of a multi-agent system where different agents perform distinct roles and collaborate to enhance learning and robustness in dynamic and complex environments. Although some multi-agent approaches exist for AOT, they typically rely on external auxiliary agents, which require additional devices, making them costly. In contrast, we introduce the Collaborative System for Active Object Tracking (CSAOT), a method that leverages multi-agent deep reinforcement learning (MADRL) and a Mixture of Experts (MoE) framework to enable multiple agents to operate on a single device, thereby improving tracking performance and reducing costs. Our approach enhances robustness against occlusions and rapid motion while optimizing camera movements to extend tracking duration. We validated the effectiveness of CSAOT on various interactive maps with dynamic and stationary obstacles. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.13992 [pdf, other]

Dual-Branch HNSW Approach with Skip Bridges and LID-Driven Optimization

Authors: Hy Nguyen, Nguyen Hung Nguyen, Nguyen Linh Bao Nguyen, Srikanth Thudumu, Hung Du, Rajesh Vasa, Kon Mouzakis

Abstract: The Hierarchical Navigable Small World (HNSW) algorithm is widely used for approximate nearest neighbor (ANN) search, leveraging the principles of navigable small-world graphs. However, it faces some limitations. The first is the local optima problem, which arises from the algorithm's greedy search strategy, selecting neighbors based solely on proximity at each step. This often leads to cluster di… ▽ More The Hierarchical Navigable Small World (HNSW) algorithm is widely used for approximate nearest neighbor (ANN) search, leveraging the principles of navigable small-world graphs. However, it faces some limitations. The first is the local optima problem, which arises from the algorithm's greedy search strategy, selecting neighbors based solely on proximity at each step. This often leads to cluster disconnections. The second limitation is that HNSW frequently fails to achieve logarithmic complexity, particularly in high-dimensional datasets, due to the exhaustive traversal through each layer. To address these limitations, we propose a novel algorithm that mitigates local optima and cluster disconnections while enhancing the construction speed, maintaining inference speed. The first component is a dual-branch HNSW structure with LID-based insertion mechanisms, enabling traversal from multiple directions. This improves outlier node capture, enhances cluster connectivity, accelerates construction speed and reduces the risk of local minima. The second component incorporates a bridge-building technique that bypasses redundant intermediate layers, maintaining inference and making up the additional computational overhead introduced by the dual-branch structure. Experiments on various benchmarks and datasets showed that our algorithm outperforms the original HNSW in both accuracy and speed. We evaluated six datasets across Computer Vision (CV), and Natural Language Processing (NLP), showing recall improvements of 18\% in NLP, and up to 30\% in CV tasks while reducing the construction time by up to 20\% and maintaining the inference speed. We did not observe any trade-offs in our algorithm. Ablation studies revealed that LID-based insertion had the greatest impact on performance, followed by the dual-branch structure and bridge-building components. △ Less

Submitted 25 April, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

arXiv:2501.08350 [pdf]

Symmetry-Breaking of Turbulent Flow in Periodic Porous Media at Intermediate Porosities

Authors: Vishal Srikanth, Andrey V Kuznetsov

Abstract: This paper presents a novel discovery of a symmetry-breaking effect in porous media with porosity between 0.8-0.9, which we are referring to as the intermediate porosity flow regime. Using large eddy simulation, we studied how heat transfer and turbulent convection occurs within these materials at a microscopic level. We observed symmetry-breaking in porous structures made of regularly spaced circ… ▽ More This paper presents a novel discovery of a symmetry-breaking effect in porous media with porosity between 0.8-0.9, which we are referring to as the intermediate porosity flow regime. Using large eddy simulation, we studied how heat transfer and turbulent convection occurs within these materials at a microscopic level. We observed symmetry-breaking in porous structures made of regularly spaced circular cylinders, a common design in heat exchangers, immediately following the laminar to turbulent flow transition between Reynolds numbers of 37 and 100. Asymmetric patterns persisted up to Reynolds numbers of 1,000. The initial breakdown of symmetry occurs through a Hopf bifurcation, creating an oscillating flow pattern as shear layers interact around the solid obstacles. When the flow becomes turbulent, random variations in the timing of vortex oscillations (caused by the secondary instability) create asymmetric distributions of fluid velocity and temperature throughout the porous space. This leads to the formation of alternating channels with high and low velocity fluid flow. At the macroscale level, this loss of symmetry creates residual transverse drag force components and asymmetric heat flux distribution on the solid obstacle surfaces. Interestingly, the oscillating flow pattern promotes attached flow on the circular cylinder surfaces, which enhances heat transfer from the cylinders to the fluid. We observe that this secondary flow instability is the primary mechanism of enhanced turbulent heat flux from porous media with circular cylinders compared to those with square cylinders. △ Less

Submitted 12 January, 2025; originally announced January 2025.

Comments: arXiv admin note: substantial text overlap with arXiv:2407.12955

arXiv:2501.07422 [pdf, other]

Characterisation of Open Quantum System Dynamics based on Information Back-flow

Authors: Vijay Pathak, R. Srikanth

Abstract: For unital dynamics, we show that a generalized trace distance measure offers no advantage over the trace distance measure for witnessing non-Markovianity. We determine the class of non-unital channels where the standard trace distance measure is insufficient here and the generalized measure is necessary. Finally, we assess the status of the GTD measure as an indicator of information flow between… ▽ More For unital dynamics, we show that a generalized trace distance measure offers no advantage over the trace distance measure for witnessing non-Markovianity. We determine the class of non-unital channels where the standard trace distance measure is insufficient here and the generalized measure is necessary. Finally, we assess the status of the GTD measure as an indicator of information flow between an open system and its environment. △ Less

Submitted 13 January, 2025; originally announced January 2025.

arXiv:2501.06846 [pdf, ps, other]

On the eternal non-Markovianity of qubit maps

Authors: Vinayak Jagadish, R. Srikanth

Abstract: As is well known, unital Pauli maps can be eternally non-CP-divisible. In contrast, here we show that in the case of non-unital maps, eternal non-Markovianity in the non-unital part is ruled out. In the unital case, the eternal non-Markovianity can be obtained by a convex combination of two dephasing semigroups, but not all three of them. We study these results and the ramifications arising from t… ▽ More As is well known, unital Pauli maps can be eternally non-CP-divisible. In contrast, here we show that in the case of non-unital maps, eternal non-Markovianity in the non-unital part is ruled out. In the unital case, the eternal non-Markovianity can be obtained by a convex combination of two dephasing semigroups, but not all three of them. We study these results and the ramifications arising from them. △ Less

Submitted 12 January, 2025; originally announced January 2025.

Comments: 4 pages

arXiv:2501.06762 [pdf]

Improving the adaptive and continuous learning capabilities of artificial neural networks: Lessons from multi-neuromodulatory dynamics

Authors: Jie Mei, Alejandro Rodriguez-Garcia, Daigo Takeuchi, Gabriel Wainstein, Nina Hubig, Yalda Mohsenzadeh, Srikanth Ramaswamy

Abstract: Continuous, adaptive learning-the ability to adapt to the environment and improve performance-is a hallmark of both natural and artificial intelligence. Biological organisms excel in acquiring, transferring, and retaining knowledge while adapting to dynamic environments, making them a rich source of inspiration for artificial neural networks (ANNs). This study explores how neuromodulation, a funda… ▽ More Continuous, adaptive learning-the ability to adapt to the environment and improve performance-is a hallmark of both natural and artificial intelligence. Biological organisms excel in acquiring, transferring, and retaining knowledge while adapting to dynamic environments, making them a rich source of inspiration for artificial neural networks (ANNs). This study explores how neuromodulation, a fundamental feature of biological learning systems, can help address challenges such as catastrophic forgetting and enhance the robustness of ANNs in continuous learning scenarios. Driven by neuromodulators including dopamine (DA), acetylcholine (ACh), serotonin (5-HT) and noradrenaline (NA), neuromodulatory processes in the brain operate at multiple scales, facilitating dynamic responses to environmental changes through mechanisms ranging from local synaptic plasticity to global network-wide adaptability. Importantly, the relationship between neuromodulators, and their interplay in the modulation of sensory and cognitive processes are more complex than expected, demonstrating a "many-to-one" neuromodulator-to-task mapping. To inspire the design of novel neuromodulation-aware learning rules, we highlight (i) how multi-neuromodulatory interactions enrich single-neuromodulator-driven learning, (ii) the impact of neuromodulators at multiple spatial and temporal scales, and correspondingly, (iii) strategies to integrate neuromodulated learning into or approximate it in ANNs. To illustrate these principles, we present a case study to demonstrate how neuromodulation-inspired mechanisms, such as DA-driven reward processing and NA-based cognitive flexibility, can enhance ANN performance in a Go/No-Go task. By integrating multi-scale neuromodulation, we aim to bridge the gap between biological learning and artificial systems, paving the way for ANNs with greater flexibility, robustness, and adaptability. △ Less

Submitted 12 January, 2025; originally announced January 2025.

arXiv:2501.02274 [pdf, other]

The Solar Ultraviolet Imaging Telescope on board Aditya-L1

Authors: Durgesh Tripathi, A. N. Ramaprakash, Sreejith Padinhatteeri, Janmejoy Sarkar, Mahesh Burse, Anurag Tyagi, Ravi Kesharwani, Sakya Sinha, Bhushan Joshi, Rushikesh Deogaonkar, Soumya Roy, V. N. Nived, Rahul Gopalakrishnan, Akshay Kulkarni, Aafaque Khan, Avyarthana Ghosh, Chaitanya Rajarshi, Deepa Modi, Ghanshyam Kumar, Reena Yadav, Manoj Varma, Raja Bayanna, Pravin Chordia, Mintu Karmakar, Linn Abraham , et al. (53 additional authors not shown)

Abstract: The Solar Ultraviolet Imaging Telescope (SUIT) is an instrument on the Aditya-L1 mission of the Indian Space Research Organization (ISRO) launched on September 02, 2023. SUIT continuously provides, near-simultaneous full-disk and region-of-interest images of the Sun, slicing through the photosphere and chromosphere and covering a field of view up to 1.5 solar radii. For this purpose, SUIT uses 11… ▽ More The Solar Ultraviolet Imaging Telescope (SUIT) is an instrument on the Aditya-L1 mission of the Indian Space Research Organization (ISRO) launched on September 02, 2023. SUIT continuously provides, near-simultaneous full-disk and region-of-interest images of the Sun, slicing through the photosphere and chromosphere and covering a field of view up to 1.5 solar radii. For this purpose, SUIT uses 11 filters tuned at different wavelengths in the 200{--}400~nm range, including the Mg~{\sc ii} h~and~k and Ca~{\sc ii}~H spectral lines. The observations made by SUIT help us understand the magnetic coupling of the lower and middle solar atmosphere. In addition, for the first time, it allows the measurements of spatially resolved solar broad-band radiation in the near and mid ultraviolet, which will help constrain the variability of the solar ultraviolet irradiance in a wavelength range that is central for the chemistry of the Earth's atmosphere. This paper discusses the details of the instrument and data products. △ Less

Submitted 10 January, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

Comments: 37 pages, Accepted for Publication in Solar Physics

arXiv:2412.18566 [pdf, other]

Zero-resource Speech Translation and Recognition with LLMs

Authors: Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff

Abstract: Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a m… ▽ More Despite recent advancements in speech processing, zero-resource speech translation (ST) and automatic speech recognition (ASR) remain challenging problems. In this work, we propose to leverage a multilingual Large Language Model (LLM) to perform ST and ASR in languages for which the model has never seen paired audio-text data. We achieve this by using a pre-trained multilingual speech encoder, a multilingual LLM, and a lightweight adaptation module that maps the audio representations to the token embedding space of the LLM. We perform several experiments both in ST and ASR to understand how to best train the model and what data has the most impact on performance in previously unseen languages. In ST, our best model is capable to achieve BLEU scores over 23 in CoVoST2 for two previously unseen languages, while in ASR, we achieve WERs of up to 28.2\%. We finally show that the performance of our system is bounded by the ability of the LLM to output text in the desired language. △ Less

Submitted 30 December, 2024; v1 submitted 24 December, 2024; originally announced December 2024.

Comments: ICASSP 2025, 5 pages, 2 figures, 2 tables

arXiv:2412.16530 [pdf, other]

Improving Lip-synchrony in Direct Audio-Visual Speech-to-Speech Translation

Authors: Lucas Goncalves, Prashant Mathur, Xing Niu, Brady Houston, Chandrashekhar Lavania, Srikanth Vishnubhotla, Lijia Sun, Anthony Ferritto

Abstract: Audio-Visual Speech-to-Speech Translation typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony-ensuring that the movements of the lips match the spoken content-essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been lar… ▽ More Audio-Visual Speech-to-Speech Translation typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony-ensuring that the movements of the lips match the spoken content-essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been largely overlooked. This study addresses this gap by integrating a lip-synchrony loss into the training process of AVS2S models. Our proposed method significantly enhances lip-synchrony in direct audio-visual speech-to-speech translation, achieving an average LSE-D score of 10.67, representing a 9.2% reduction in LSE-D over a strong baseline across four language pairs. Additionally, it maintains the naturalness and high quality of the translated speech when overlaid onto the original video, without any degradation in translation quality. △ Less

Submitted 21 December, 2024; originally announced December 2024.

Comments: Accepted at ICASSP, 4 pages

arXiv:2412.16500 [pdf, other]

Speech Retrieval-Augmented Generation without Automatic Speech Recognition

Authors: Do June Min, Karel Mundnich, Andy Lapastora, Erfan Soltanmohammadi, Srikanth Ronanki, Kyu Han

Abstract: One common approach for question answering over speech data is to first transcribe speech using automatic speech recognition (ASR) and then employ text-based retrieval-augmented generation (RAG) on the transcriptions. While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduc… ▽ More One common approach for question answering over speech data is to first transcribe speech using automatic speech recognition (ASR) and then employ text-based retrieval-augmented generation (RAG) on the transcriptions. While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduce SpeechRAG, a novel framework designed for open-question answering over spoken data. Our proposed approach fine-tunes a pre-trained speech encoder into a speech adapter fed into a frozen large language model (LLM)--based retrieval model. By aligning the embedding spaces of text and speech, our speech retriever directly retrieves audio passages from text-based queries, leveraging the retrieval capacity of the frozen text retriever. Our retrieval experiments on spoken question answering datasets show that direct speech retrieval does not degrade over the text-based baseline, and outperforms the cascaded systems using ASR. For generation, we use a speech language model (SLM) as a generator, conditioned on audio passages rather than transcripts. Without fine-tuning of the SLM, this approach outperforms cascaded text-based models when there is high WER in the transcripts. △ Less

Submitted 3 January, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

Comments: ICASSP 2025

arXiv:2412.16429 [pdf, other]

LearnLM: Improving Gemini for Learning

Authors: LearnLM Team, Abhinit Modi, Aditya Srikanth Veerubhotla, Aliya Rysbek, Andrea Huber, Brett Wiltshire, Brian Veprek, Daniel Gillick, Daniel Kasenberg, Derek Ahmed, Irina Jurenka, James Cohan, Jennifer She, Julia Wilkowski, Kaiz Alarakyia, Kevin R. McKee, Lisa Wang, Markus Kunesch, Mike Schaekermann, Miruna Pîslar, Nikhil Joshi, Parsa Mahmoudieh, Paul Jhun, Sara Wiltberger, Shakir Mohamed , et al. (21 additional authors not shown)

Abstract: Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level ins… ▽ More Today's generative AI systems are tuned to present information by default rather than engage users in service of learning as a human tutor would. To address the wide range of potential education use cases for these systems, we reframe the challenge of injecting pedagogical behavior as one of \textit{pedagogical instruction following}, where training and evaluation examples include system-level instructions describing the specific pedagogy attributes present or desired in subsequent model turns. This framing avoids committing our models to any particular definition of pedagogy, and instead allows teachers or developers to specify desired model behavior. It also clears a path to improving Gemini models for learning -- by enabling the addition of our pedagogical data to post-training mixtures -- alongside their rapidly expanding set of capabilities. Both represent important changes from our initial tech report. We show how training with pedagogical instruction following produces a LearnLM model (available on Google AI Studio) that is preferred substantially by expert raters across a diverse set of learning scenarios, with average preference strengths of 31\% over GPT-4o, 11\% over Claude 3.5, and 13\% over the Gemini 1.5 Pro model LearnLM was based on. △ Less

Submitted 25 December, 2024; v1 submitted 20 December, 2024; originally announced December 2024.

arXiv:2412.16082 [pdf, other]

Bounds on concatenated entanglement-assisted quantum error-correcting codes

Authors: Nihar Ranjan Dash, Sanjoy Dutta, R. Srikanth, Subhashish Banerjee

Abstract: Entanglement-assisted quantum error-correcting codes (EAQECCs) make use of pre-shared entanglement to enhance the rate of error correction and communication. We study the concatenation of EAQECCs, in specific showing how the order of concatenation affects the number of ebits consumed, the logical error probability, the pseudo-threshold, and the violation of the quantum Hamming bound. We find that… ▽ More Entanglement-assisted quantum error-correcting codes (EAQECCs) make use of pre-shared entanglement to enhance the rate of error correction and communication. We study the concatenation of EAQECCs, in specific showing how the order of concatenation affects the number of ebits consumed, the logical error probability, the pseudo-threshold, and the violation of the quantum Hamming bound. We find that if the quaternary code from which an EAQECC is derived saturates the Griesmer (resp., Plotkin) bound, then the derived code will saturate the Griesmer (resp., linear Plotkin) bound for EAQECCs. We present families of concatenated EAQECCs that saturate the quantum Singleton, Griesmer, and linear Plotkin bounds for EAQECCs. △ Less

Submitted 20 December, 2024; originally announced December 2024.

Comments: 9 pages, 6 figures

arXiv:2412.07416 [pdf]

SHAPE -- A Spectro-Polarimeter Onboard Propulsion Module of Chandrayaan-3 Mission

Authors: Anuj Nandi, Swapnil Singh, Bhavesh Jaiswal, Anand Jain, Smrati Verma, Reenu Palawat, Ravishankar B. T., Brajpal Singh, Anurag Tyagi, Priyanka Das, Supratik Bose, Supriya Verma, Waghmare Rahul Gautam, Yogesh Prasad K. R., Bijoy Raha, Bhavesh Mendhekar, Sathyanaryana Raju K., Srinivasa Rao Kondapi V., Sumit Kumar, Mukund Kumar Thakur, Vinti Bhatia, Nidhi Sharma, Govinda Rao Yenni, Neeraj Kumar Satya, Venkata Raghavendra , et al. (9 additional authors not shown)

Abstract: SHAPE (Spectro-polarimetry of HAbitable Planet Earth) is an experiment onboard the Chandrayaan-3 Mission, designed to study the spectro-polarimetric signatures of the habitable planet Earth in the near-infrared (NIR) wavelength range (1.0 - 1.7 $μ$m). The spectro-polarimeter is the only scientific payload (experimental in nature) on the Propulsion Module (PM) of the Chandrayaan-3 mission. The inst… ▽ More SHAPE (Spectro-polarimetry of HAbitable Planet Earth) is an experiment onboard the Chandrayaan-3 Mission, designed to study the spectro-polarimetric signatures of the habitable planet Earth in the near-infrared (NIR) wavelength range (1.0 - 1.7 $μ$m). The spectro-polarimeter is the only scientific payload (experimental in nature) on the Propulsion Module (PM) of the Chandrayaan-3 mission. The instrument is a compact and lightweight spectro-polarimeter with an Acousto-Optic Tunable Filter (AOTF) at its core. The AOTF operates in the frequency range of 80 MHz to 135 MHz with a power of 0.5 - 2.0 Watts. The two output beams (e-beam and o-beam) from the AOTF are focused onto two InGaAs detectors (pixelated, 1D linear array) with the help of focusing optics. The primary (aperture) optics, with a diameter of $\sim$2 mm, collects the NIR light for input to the AOTF, defining the field of view (FOV) of 2.6$^\circ$. The payload has a mass of 4.8 kg and operates at a power of 25 Watts. This manuscript highlights some of the ground-based results, including the post-launch initial performance of the payload while orbiting around the Moon to observe Earth. △ Less

Submitted 10 December, 2024; originally announced December 2024.

Comments: Accepted for publication in Journal of Aerospace Sciences and Technologies

arXiv:2412.01321 [pdf]

Physically Constrained 3D Diffusion for Inverse Design of Fiber-reinforced Polymer Composite Materials

Authors: Pei Xu, Yunpeng Wu, Srikanth Pilla, Gang Li, Feng Luo

Abstract: Designing fiber-reinforced polymer composites (FRPCs) with a tailored nonlinear stress-strain response can enable innovative applications across various industries. Currently, no efforts have achieved the inverse design of FRPCs that target the entire stress-strain curve. Here, we develop PC3D_Diffusion, a 3D spatial diffusion model designed for the inverse design of FRPCs. We generate 1.35 millio… ▽ More Designing fiber-reinforced polymer composites (FRPCs) with a tailored nonlinear stress-strain response can enable innovative applications across various industries. Currently, no efforts have achieved the inverse design of FRPCs that target the entire stress-strain curve. Here, we develop PC3D_Diffusion, a 3D spatial diffusion model designed for the inverse design of FRPCs. We generate 1.35 million FRPCs and calculate their stress-strain curves for training. Although the vanilla PC3D_Diffusion can generate visually appealing results, less than 10% of FRPCs generated by the vanilla model are collision-free, in which fibers do not intersect with each other. We then propose a loss-guided, learning-free approach to apply physical constraints during generation. As a result, PC3D_Diffusion can generate high-quality designs with tailored mechanical behaviors while guaranteeing to satisfy the physical constraints. PC3D_Diffusion advances FRPC inverse design and may facilitate the inverse design of other 3D materials, offering potential applications in industries reliant on materials with custom mechanical properties. △ Less

Submitted 2 December, 2024; originally announced December 2024.

arXiv:2412.00622 [pdf, other]

Visual Modality Prompt for Adapting Vision-Language Object Detectors

Authors: Heitor R. Medeiros, Atif Belal, Srikanth Muralidharan, Eric Granger, Marco Pedersoli

Abstract: The zero-shot performance of object detectors degrades when tested on different modalities, such as infrared and depth. While recent work has explored image translation techniques to adapt detectors to new modalities, these methods are limited to a single modality and apply only to traditional detectors. Recently, vision-language detectors, such as YOLO-World and Grounding DINO, have shown promisi… ▽ More The zero-shot performance of object detectors degrades when tested on different modalities, such as infrared and depth. While recent work has explored image translation techniques to adapt detectors to new modalities, these methods are limited to a single modality and apply only to traditional detectors. Recently, vision-language detectors, such as YOLO-World and Grounding DINO, have shown promising zero-shot capabilities, however, they have not yet been adapted for other visual modalities. Traditional fine-tuning approaches compromise the zero-shot capabilities of the detectors. The visual prompt strategies commonly used for classification with vision-language models apply the same linear prompt translation to each image, making them less effective. To address these limitations, we propose ModPrompt, a visual prompt strategy to adapt vision-language detectors to new modalities without degrading zero-shot performance. In particular, an encoder-decoder visual prompt strategy is proposed, further enhanced by the integration of inference-friendly modality prompt decoupled residual, facilitating a more robust adaptation. Empirical benchmarking results show our method for modality adaptation on two vision-language detectors, YOLO-World and Grounding DINO, and on challenging infrared (LLVIP, FLIR) and depth (NYUv2) datasets, achieving performance comparable to full fine-tuning while preserving the model's zero-shot capability. Code available at: https://github.com/heitorrapela/ModPrompt. △ Less

Submitted 14 March, 2025; v1 submitted 30 November, 2024; originally announced December 2024.

arXiv:2411.17172 [pdf]

Superparamagnetic Superparticles for Magnetic Hyperthermia Therapy: Overcoming the Particle Size Limit

Authors: Supun B. Attanayake, Minh Dang Nguyen, Amit Chanda, Javier Alonso, Inaki Orue, T. Randall Lee, Hariharan Srikanth, Manh-Huong Phan

Abstract: Iron oxide (e.g., Fe$_3$O$_4$ or Fe$_2$O$_3$) nanoparticles are promising candidates for a variety of biomedical applications ranging from magnetic hyperthermia therapy to drug delivery and bio-detection, due to their superparamagnetism, non-toxicity, and biodegradability. While particles of small size (below a critical size, ~20 nm) display superparamagnetic behavior at room temperature, these pa… ▽ More Iron oxide (e.g., Fe$_3$O$_4$ or Fe$_2$O$_3$) nanoparticles are promising candidates for a variety of biomedical applications ranging from magnetic hyperthermia therapy to drug delivery and bio-detection, due to their superparamagnetism, non-toxicity, and biodegradability. While particles of small size (below a critical size, ~20 nm) display superparamagnetic behavior at room temperature, these particles tend to penetrate highly sensitive areas of the body such as the Blood-Brain Barrier (BBB), leading to undesired effects. In addition, these particles possess a high probability of retention, which can lead to genotoxicity and biochemical toxicity. Increasing particle size is a means for addressing these problems but also suppresses the superparamagnetism. We have overcome this particle size limit by synthesizing unique polycrystalline iron oxide nanoparticles composed of multiple nanocrystals of 10 to 15 nm size while tuning particle size from 160 to 400 nm. These so-called superparticles preserve superparamagnetic characteristics and exhibit excellent hyperthermia responses. The specific absorption rates (SAR) exceed 250 W/g (HAC = 800 Oe, f = 310 kHz) at a low concentration of 0.5 mg/mL, indicating their capability in cancer treatment with minimum dose. Our study underscores the potential of size-tunable polycrystalline iron oxide superparticles with superparamagnetic properties for advanced biomedical applications and sensing technologies. △ Less

Submitted 26 November, 2024; originally announced November 2024.

arXiv:2411.14611 [pdf, other]

CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs

Authors: Alex Mathai, Kranthi Sedamaki, Debeshee Das, Noble Saji Mathews, Srikanth Tamilselvam, Sridhar Chimalakonda, Atul Kumar

Abstract: Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspir… ▽ More Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source code representations that effectively capture the syntactic and semantic characteristics of code. In recent years, pre-trained transformer-based models, inspired by natural language processing (NLP), have shown remarkable success in SE tasks. However, source code contains structural and semantic properties embedded within its grammar, which can be extracted from structured code-views like the Abstract Syntax Tree (AST), Data-Flow Graph (DFG), and Control-Flow Graph (CFG). These code-views can complement NLP techniques, further improving SE tasks. Unfortunately, there are no flexible frameworks to infuse arbitrary code-views into existing transformer-based models effectively. Therefore, in this work, we propose CodeSAM, a novel scalable framework to infuse multiple code-views into transformer-based models by creating self-attention masks. We use CodeSAM to fine-tune a small language model (SLM) like CodeBERT on the downstream SE tasks of semantic code search, code clone detection, and program classification. Experimental results show that by using this technique, we improve downstream performance when compared to SLMs like GraphCodeBERT and CodeBERT on all three tasks by utilizing individual code-views or a combination of code-views during fine-tuning. We believe that these results are indicative that techniques like CodeSAM can help create compact yet performant code SLMs that fit in resource constrained settings. △ Less

Submitted 21 November, 2024; originally announced November 2024.

arXiv:2411.07374 [pdf, ps, other]

Low Degree Local Correction Over the Boolean Cube

Authors: Prashanth Amireddy, Amik Raj Behera, Manaswi Paraashar, Srikanth Srinivasan, Madhu Sudan

Abstract: In this work, we show that the class of multivariate degree-$d$ polynomials mapping $\{0,1\}^{n}$ to any Abelian group $G$ is locally correctable with $\widetilde{O}_{d}((\log n)^{d})$ queries for up to a fraction of errors approaching half the minimum distance of the underlying code. In particular, this result holds even for polynomials over the reals or the rationals, special cases that were pre… ▽ More In this work, we show that the class of multivariate degree-$d$ polynomials mapping $\{0,1\}^{n}$ to any Abelian group $G$ is locally correctable with $\widetilde{O}_{d}((\log n)^{d})$ queries for up to a fraction of errors approaching half the minimum distance of the underlying code. In particular, this result holds even for polynomials over the reals or the rationals, special cases that were previously not known. Further, we show that they are locally list correctable up to a fraction of errors approaching the minimum distance of the code. These results build on and extend the prior work of the authors [ABPSS24] (STOC 2024) who considered the case of linear polynomials and gave analogous results. Low-degree polynomials over the Boolean cube $\{0,1\}^{n}$ arise naturally in Boolean circuit complexity and learning theory, and our work furthers the study of their coding-theoretic properties. Extending the results of [ABPSS24] from linear to higher-degree polynomials involves several new challenges and handling them gives us further insights into properties of low-degree polynomials over the Boolean cube. For local correction, we construct a set of points in the Boolean cube that lie between two exponentially close parallel hyperplanes and is moreover an interpolating set for degree-$d$ polynomials. To show that the class of degree-$d$ polynomials is list decodable up to the minimum distance, we stitch together results on anti-concentration of low-degree polynomials, the Sunflower lemma, and the Footprint bound for counting common zeroes of polynomials. Analyzing the local list corrector of [ABPSS24] for higher degree polynomials involves understanding random restrictions of non-zero degree-$d$ polynomials on a Hamming slice. In particular, we show that a simple random restriction process for reducing the dimension of the Boolean cube is a suitably good sampler for Hamming slices. △ Less

Submitted 12 November, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

Comments: 64 pages, To appear in SODA 2025, deleted image files

arXiv:2411.07328 [pdf, ps, other]

Proxy-small objects present compactly generated categories

Authors: Benjamin Briggs, Srikanth B. Iyengar, Greg Stevenson

Abstract: We develop a correspondence between presentations of compactly generated triangulated categories as localizations of derived categories of ring spectra and proxy-small objects, and explore some consequences. In addition, we give a characterization of proxy-smallness in terms of coproduct preservation of the associated corepresentable functor `up to base change'. We develop a correspondence between presentations of compactly generated triangulated categories as localizations of derived categories of ring spectra and proxy-small objects, and explore some consequences. In addition, we give a characterization of proxy-smallness in terms of coproduct preservation of the associated corepresentable functor `up to base change'. △ Less

Submitted 17 December, 2024; v1 submitted 11 November, 2024; originally announced November 2024.

Comments: 17 pages; minor changes and additions

MSC Class: 18G80; 13D09

arXiv:2411.07264 [pdf, other]

Multi-Document Financial Question Answering using LLMs

Authors: Shalin Shah, Srikanth Ryali, Ramasubbu Venkatesh

Abstract: We propose two new methods for multi-document financial question answering. First, a method that uses semantic tagging, and then, queries the index to get the context (RAG_SEM). And second, a Knowledge Graph (KG_RAG) based method that uses semantic tagging, and, retrieves knowledge graph triples from a graph database, as context. KG_RAG uses knowledge graphs constructed using a small model that is… ▽ More We propose two new methods for multi-document financial question answering. First, a method that uses semantic tagging, and then, queries the index to get the context (RAG_SEM). And second, a Knowledge Graph (KG_RAG) based method that uses semantic tagging, and, retrieves knowledge graph triples from a graph database, as context. KG_RAG uses knowledge graphs constructed using a small model that is fine-tuned using knowledge distillation using a large teacher model. The data consists of 18 10K reports of Apple, Microsoft, Alphabet, NVIDIA, Amazon and Tesla for the years 2021, 2022 and 2023. The list of questions in the data consists of 111 complex questions including many esoteric questions that are difficult to answer and the answers are not completely obvious. As evaluation metrics, we use overall scores as well as segmented scores for measurement including the faithfulness, relevance, correctness, similarity, an LLM based overall score and the rouge scores as well as a similarity of embeddings. We find that both methods outperform plain RAG significantly. KG_RAG outperforms RAG_SEM in four out of nine metrics. △ Less

Submitted 8 November, 2024; originally announced November 2024.

arXiv:2410.21422 [pdf, other]

A Foundation Model for Chemical Design and Property Prediction

Authors: Feiyang Cai, Katelin Hanna, Tianyu Zhu, Tzuen-Rong Tzeng, Yongping Duan, Ling Liu, Srikanth Pilla, Gang Li, Feng Luo

Abstract: Artificial intelligence (AI) has significantly advanced computational chemistry research in various tasks. However, traditional AI methods often rely on task-specific model designs and training, which constrain both the scalability of model size and generalization across different tasks. Here, we introduce ChemFM, a large foundation model specifically developed for chemicals. ChemFM comprises 3 bi… ▽ More Artificial intelligence (AI) has significantly advanced computational chemistry research in various tasks. However, traditional AI methods often rely on task-specific model designs and training, which constrain both the scalability of model size and generalization across different tasks. Here, we introduce ChemFM, a large foundation model specifically developed for chemicals. ChemFM comprises 3 billion parameters and is pre-trained on 178 million molecules using self-supervised causal language modeling to extract generalizable molecular representations. This model can be adapted to diverse downstream chemical applications using either full-parameter or parameter-efficient fine-tuning methods. ChemFM consistently outperforms state-of-the-art task-specific AI models across all tested tasks. Notably, it achieves up to 67.48% performance improvement across 34 property prediction benchmarks, up to 33.80% reduction in mean average deviation between conditioned and actual properties of generated molecules in conditional molecular generation tasks, and up to 3.7% top-1 accuracy improvement across 4 reaction prediction datasets. Moreover, ChemFM demonstrates its superior performance in predicting antibiotic activity and cytotoxicity, highlighting its potential to advance the discovery of novel antibiotics. We anticipate that ChemFM will significantly advance chemistry research by providing a foundation model capable of effectively generalizing across a broad range of tasks with minimal additional training. △ Less

Submitted 23 January, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

arXiv:2410.19206 [pdf, other]

Inference time LLM alignment in single and multidomain preference spectrum

Authors: Sadat Shahriar, Zheng Qi, Nikolaos Pappas, Srikanth Doss, Monica Sunkara, Kishaloy Halder, Manuel Mager, Yassine Benajiba

Abstract: Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. To address these li… ▽ More Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. To address these limitations, we introduce inference-time model alignment method that learns encoded representations of preference dimensions, called \textit{Alignment Vectors} (AV). These representations are computed by subtraction of the base model from the aligned model as in model editing enabling dynamically adjusting the model behavior during inference through simple linear operations. Even though the preference dimensions can span various granularity levels, here we focus on three gradual response levels across three specialized domains: medical, legal, and financial, exemplifying its practical potential. This new alignment paradigm introduces adjustable preference knobs during inference, allowing users to tailor their LLM outputs while reducing the inference cost by half compared to the prompt engineering approach. Additionally, we find that AVs are transferable across different fine-tuning stages of the same model, demonstrating their flexibility. AVs also facilitate multidomain, diverse preference alignment, making the process 12x faster than the retraining approach. △ Less

Submitted 24 October, 2024; originally announced October 2024.

arXiv:2410.18481 [pdf, other]

Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction

Authors: Sergio Burdisso, Srikanth Madikeri, Petr Motlicek

Abstract: Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of large language models in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce… ▽ More Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of large language models in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce Dialog2Flow (D2F) embeddings, which differ from conventional sentence embeddings by mapping utterances to a latent space where they are grouped according to their communicative and informative functions (i.e., the actions they represent). D2F allows for modeling dialogs as continuous trajectories in a latent space with distinct action-related regions. By clustering D2F embeddings, the latent space is quantized, and dialogs can be converted into sequences of region/action IDs, facilitating the extraction of the underlying workflow. To pre-train D2F, we build a comprehensive dataset by unifying twenty task-oriented dialog datasets with normalized per-turn action annotations. We also introduce a novel soft contrastive loss that leverages the semantic information of these actions to guide the representation learning process, showing superior performance compared to standard supervised contrastive loss. Evaluation against various sentence embeddings, including dialog-specific ones, demonstrates that D2F yields superior qualitative and quantitative results across diverse domains. △ Less

Submitted 5 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

Comments: Accepted to EMNLP 2024 main conference

Journal ref: https://aclanthology.org/2024.emnlp-main.310/

arXiv:2410.14748 [pdf, other]

ETF: An Entity Tracing Framework for Hallucination Detection in Code Summaries

Authors: Kishan Maharaj, Vitobha Munigala, Srikanth G. Tamilselvam, Prince Kumar, Sayandeep Sen, Palani Kodeswaran, Abhijit Mishra, Pushpak Bhattacharyya

Abstract: Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL2Code) and code summarization. However, LLMs are prone to hallucination-outputs that stray from intended meanings. Detecting hallucinations in code summarization is especially difficult due to the com… ▽ More Recent advancements in large language models (LLMs) have significantly enhanced their ability to understand both natural language and code, driving their use in tasks like natural language-to-code (NL2Code) and code summarization. However, LLMs are prone to hallucination-outputs that stray from intended meanings. Detecting hallucinations in code summarization is especially difficult due to the complex interplay between programming and natural languages. We introduce a first-of-its-kind dataset with $\sim$10K samples, curated specifically for hallucination detection in code summarization. We further propose a novel Entity Tracing Framework (ETF) that a) utilizes static program analysis to identify code entities from the program and b) uses LLMs to map and verify these entities and their intents within generated code summaries. Our experimental analysis demonstrates the effectiveness of the framework, leading to a 0.73 F1 score. This approach provides an interpretable method for detecting hallucinations by grounding entities, allowing us to evaluate summary accuracy. △ Less

Submitted 18 December, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

Comments: 11 pages, 6 Figures, 5 Tables

arXiv:2410.13007 [pdf, other]

Codellm-Devkit: A Framework for Contextualizing Code LLMs with Program Analysis Insights

Authors: Rahul Krishna, Rangeet Pan, Raju Pavuluri, Srikanth Tamilselvam, Maja Vukovic, Saurabh Sinha

Abstract: Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capabilities, offering a wide array of functionalities such as code completion, code generation, code summarization, test generation, code translation, and more. To leverage code LLMs to their full potential, developers must provide code-specific contextual information to the models. These are typically derived a… ▽ More Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capabilities, offering a wide array of functionalities such as code completion, code generation, code summarization, test generation, code translation, and more. To leverage code LLMs to their full potential, developers must provide code-specific contextual information to the models. These are typically derived and distilled using program analysis tools. However, there exists a significant gap--these static analysis tools are often language-specific and come with a steep learning curve, making their effective use challenging. These tools are tailored to specific program languages, requiring developers to learn and manage multiple tools to cover various aspects of the their code base. Moreover, the complexity of configuring and integrating these tools into the existing development environments add an additional layer of difficulty. This challenge limits the potential benefits that could be gained from more widespread and effective use of static analysis in conjunction with LLMs. To address this challenge, we present codellm-devkit (hereafter, `CLDK'), an open-source library that significantly simplifies the process of performing program analysis at various levels of granularity for different programming languages to support code LLM use cases. As a Python library, CLDK offers developers an intuitive and user-friendly interface, making it incredibly easy to provide rich program analysis context to code LLMs. With this library, developers can effortlessly integrate detailed, code-specific insights that enhance the operational efficiency and effectiveness of LLMs in coding tasks. CLDK is available as an open-source library at https://github.com/IBM/codellm-devkit. △ Less

Submitted 16 October, 2024; originally announced October 2024.

arXiv:2410.09047 [pdf, other]

Unraveling and Mitigating Safety Alignment Degradation of Vision-Language Models

Authors: Qin Liu, Chao Shang, Ling Liu, Nikolaos Pappas, Jie Ma, Neha Anna John, Srikanth Doss, Lluis Marquez, Miguel Ballesteros, Yassine Benajiba

Abstract: The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as ''safety alignment degradation'' in this paper, and show that the challenge arises from the representation gap that emerges when introducing vision modality to VLMs. In particular, we show that the repr… ▽ More The safety alignment ability of Vision-Language Models (VLMs) is prone to be degraded by the integration of the vision module compared to its LLM backbone. We investigate this phenomenon, dubbed as ''safety alignment degradation'' in this paper, and show that the challenge arises from the representation gap that emerges when introducing vision modality to VLMs. In particular, we show that the representations of multi-modal inputs shift away from that of text-only inputs which represent the distribution that the LLM backbone is optimized for. At the same time, the safety alignment capabilities, initially developed within the textual embedding space, do not successfully transfer to this new multi-modal representation space. To reduce safety alignment degradation, we introduce Cross-Modality Representation Manipulation (CMRM), an inference time representation intervention method for recovering the safety alignment ability that is inherent in the LLM backbone of VLMs, while simultaneously preserving the functional capabilities of VLMs. The empirical results show that our framework significantly recovers the alignment ability that is inherited from the LLM backbone with minimal impact on the fluency and linguistic capabilities of pre-trained VLMs even without additional training. Specifically, the unsafe rate of LLaVA-7B on multi-modal input can be reduced from 61.53% to as low as 3.15% with only inference-time intervention. WARNING: This paper contains examples of toxic or harmful language. △ Less

Submitted 11 October, 2024; originally announced October 2024.

Comments: Preprint

arXiv:2410.06269 [pdf, other]

A correspondence between Hebbian unlearning and steady states generated by nonequilibrium dynamics

Authors: Agnish Kumar Behera, Matthew Du, Uday Jagadisan, Srikanth Sastry, Madan Rao, Suriyanarayanan Vaikuntanathan

Abstract: The classic paradigms for learning and memory recall focus on strengths of synaptic couplings and how these can be modulated to encode memories. In a previous paper [A. K. Behera, M. Rao, S. Sastry, and S. Vaikuntanathan, Physical Review X 13, 041043 (2023)], we demonstrated how a specific non-equilibrium modification of the dynamics of an associative memory system can lead to increase in storage… ▽ More The classic paradigms for learning and memory recall focus on strengths of synaptic couplings and how these can be modulated to encode memories. In a previous paper [A. K. Behera, M. Rao, S. Sastry, and S. Vaikuntanathan, Physical Review X 13, 041043 (2023)], we demonstrated how a specific non-equilibrium modification of the dynamics of an associative memory system can lead to increase in storage capacity. In this work, using analytical theory and computational inference schemes, we show that the dynamical steady state accessed is in fact similar to those accessed after the operation of a classic unsupervised scheme for improving memory recall, Hebbian unlearning or ``dreaming". Together, our work suggests how nonequilibrium dynamics can provide an alternative route for controlling the memory encoding and recall properties of a variety of synthetic (neuromorphic) and biological systems. △ Less

Submitted 8 October, 2024; originally announced October 2024.

Comments: 15 pages, 13 figures

arXiv:2409.19519 [pdf]

The Discovery of Giant Positive Magnetoresistance in Proximity to Helimagnetic Order in Manganese Phosphide Nanostructured Films

Authors: Nivarthana W. Y. A. Y. Mudiyanselage, Derick DeTellem, Amit Chanda, Anh Tuan Duong, Tzung-En Hsieh, Johannes Frisch, Marcus Bär, Richa Pokharel Madhogaria, Shirin Mozaffari, Hasitha Suriya Arachchige, David Mandrus, Hariharan Srikanth, Sarath Witanachchi, Manh-Huong Phan

Abstract: The study of magnetoresistance (MR) phenomena has been pivotal in advancing magnetic sensors and spintronic devices. Helimagnets present an intriguing avenue for spintronics research. Theoretical predictions suggest that MR magnitude in the helimagnetic (HM) regime surpasses that in the ferromagnetic (FM) regime by over an order of magnitude. However, in metallic helimagnets like manganese phosphi… ▽ More The study of magnetoresistance (MR) phenomena has been pivotal in advancing magnetic sensors and spintronic devices. Helimagnets present an intriguing avenue for spintronics research. Theoretical predictions suggest that MR magnitude in the helimagnetic (HM) regime surpasses that in the ferromagnetic (FM) regime by over an order of magnitude. However, in metallic helimagnets like manganese phosphide, MR in the HM phase remains modest (10%), limiting its application in MR devices. Here, a groundbreaking approach is presented to achieve a giant low field MR effect in nanostructured manganese phosphide films by leveraging confinement and strain effects along with spin helicity. Unlike the modest MR observed in bulk manganese phosphide single crystals and large grain polycrystalline films, which exhibit a small negative MR in the FM region (2%) increasing to 8% in the HM region across 10-300 K, a grain size-dependent giant positive MR (90%) is discovered near FM to HM transition temperature (110 K), followed by a rapid decline to a negative MR below 55 K in manganese phosphide nanocrystalline films. These findings illuminate a novel strain-mediated spin helicity phenomenon in nanostructured helimagnets, presenting a promising pathway for the development of high-performance MR sensors and spintronic devices through the strategic utilization of confinement and strain effects. △ Less

Submitted 28 September, 2024; originally announced September 2024.

arXiv:2409.17384 [pdf, other]

Fatigue failure in glasses under cyclic shear deformation

Authors: Swarnendu Maity, Himangsu Bhaumik, Shivakumar Athani, Srikanth Sastry

Abstract: Solids subjected to repeated cycles of stress or deformation can fail after several cycles, a phenomenon termed fatigue failure. Although intensely investigated for a wide range of materials owing to its obvious practical importance, a microscopic understanding of the initiation of fatigue failure continues to be actively pursued, in particular for soft and amorphous materials. We investigate fati… ▽ More Solids subjected to repeated cycles of stress or deformation can fail after several cycles, a phenomenon termed fatigue failure. Although intensely investigated for a wide range of materials owing to its obvious practical importance, a microscopic understanding of the initiation of fatigue failure continues to be actively pursued, in particular for soft and amorphous materials. We investigate fatigue failure for glasses subjected to cyclic shear deformation through computer simulations. We show that, approaching the so-called fatigue limit, failure times display a power law divergence, at variance with commonly used functional forms, and exhibit strong dependence on the degree of annealing of the glasses. We explore several measures of damage, based on quantification of plastic rearrangements and on dissipated energy. Strikingly, the fraction of particles that undergo plastic rearrangements, and a percolation transition they undergo, are predictive of failure. We also find a robust power law relationship between accumulated damage, quantified by dissipated energy or non-affine displacements, and the failure times, which permits prediction of failure times based on behavior in the initial cycles. These observations reveal salient new microscopic features of fatigue failure and suggest approaches for developing a full microscopic picture of fatigue failure in amorphous solids. △ Less

Submitted 25 September, 2024; originally announced September 2024.

arXiv:2409.15072 [pdf, other]

Evaluating the Usability of LLMs in Threat Intelligence Enrichment

Authors: Sanchana Srikanth, Mohammad Hasanuzzaman, Farah Tasnur Meem

Abstract: Large Language Models (LLMs) have the potential to significantly enhance threat intelligence by automating the collection, preprocessing, and analysis of threat data. However, the usability of these tools is critical to ensure their effective adoption by security professionals. Despite the advanced capabilities of LLMs, concerns about their reliability, accuracy, and potential for generating inacc… ▽ More Large Language Models (LLMs) have the potential to significantly enhance threat intelligence by automating the collection, preprocessing, and analysis of threat data. However, the usability of these tools is critical to ensure their effective adoption by security professionals. Despite the advanced capabilities of LLMs, concerns about their reliability, accuracy, and potential for generating inaccurate information persist. This study conducts a comprehensive usability evaluation of five LLMs ChatGPT, Gemini, Cohere, Copilot, and Meta AI focusing on their user interface design, error handling, learning curve, performance, and integration with existing tools in threat intelligence enrichment. Utilizing a heuristic walkthrough and a user study methodology, we identify key usability issues and offer actionable recommendations for improvement. Our findings aim to bridge the gap between LLM functionality and user experience, thereby promoting more efficient and accurate threat intelligence practices by ensuring these tools are user-friendly and reliable. △ Less

Submitted 23 September, 2024; originally announced September 2024.

arXiv:2409.13502 [pdf, other]

Neural Directional Filtering: Far-Field Directivity Control With a Small Microphone Array

Authors: Julian Wechsler, Srikanth Raj Chetupalli, Mhd Modar Halimeh, Oliver Thiergart, Emanuël A. P. Habets

Abstract: Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a re… ▽ More Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering. △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: Presented at the International Workshop on Acoustic Signal Enhancement (IWAENC), 2024

arXiv:2409.06817 [pdf, other]

Bifurcation Identification for Ultrasound-driven Robotic Cannulation

Authors: Cecilia G. Morales, Dhruv Srikanth, Jack H. Good, Keith A. Dufendach, Artur Dubrawski

Abstract: In trauma and critical care settings, rapid and precise intravascular access is key to patients' survival. Our research aims at ensuring this access, even when skilled medical personnel are not readily available. Vessel bifurcations are anatomical landmarks that can guide the safe placement of catheters or needles during medical procedures. Although ultrasound is advantageous in navigating anatomi… ▽ More In trauma and critical care settings, rapid and precise intravascular access is key to patients' survival. Our research aims at ensuring this access, even when skilled medical personnel are not readily available. Vessel bifurcations are anatomical landmarks that can guide the safe placement of catheters or needles during medical procedures. Although ultrasound is advantageous in navigating anatomical landmarks in emergency scenarios due to its portability and safety, to our knowledge no existing algorithm can autonomously extract vessel bifurcations using ultrasound images. This is primarily due to the limited availability of ground truth data, in particular, data from live subjects, needed for training and validating reliable models. Researchers often resort to using data from anatomical phantoms or simulations. We introduce BIFURC, Bifurcation Identification for Ultrasound-driven Robot Cannulation, a novel algorithm that identifies vessel bifurcations and provides optimal needle insertion sites for an autonomous robotic cannulation system. BIFURC integrates expert knowledge with deep learning techniques to efficiently detect vessel bifurcations within the femoral region and can be trained on a limited amount of in-vivo data. We evaluated our algorithm using a medical phantom as well as real-world experiments involving live pigs. In all cases, BIFURC consistently identified bifurcation points and needle insertion locations in alignment with those identified by expert clinicians. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2024

arXiv:2409.05327 [pdf, other]

ICPR 2024 Competition on Safe Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather Conditions

Authors: Furqan Ahmed Shaik, Sandeep Nagar, Aiswarya Maturi, Harshit Kumar Sankhla, Dibyendu Ghosh, Anshuman Majumdar, Srikanth Vidapanakal, Kunal Chaudhary, Sunny Manchanda, Girish Varma

Abstract: The ICPR 2024 Competition on Safe Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather Conditions served as a rigorous platform to evaluate and benchmark state-of-the-art semantic segmentation models under challenging conditions for autonomous driving. Over several months, participants were provided with the IDD-AW dataset, consisting of 5000 high-quality RGB-NIR image pairs, e… ▽ More The ICPR 2024 Competition on Safe Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather Conditions served as a rigorous platform to evaluate and benchmark state-of-the-art semantic segmentation models under challenging conditions for autonomous driving. Over several months, participants were provided with the IDD-AW dataset, consisting of 5000 high-quality RGB-NIR image pairs, each annotated at the pixel level and captured under adverse weather conditions such as rain, fog, low light, and snow. A key aspect of the competition was the use and improvement of the Safe mean Intersection over Union (Safe mIoU) metric, designed to penalize unsafe incorrect predictions that could be overlooked by traditional mIoU. This innovative metric emphasized the importance of safety in developing autonomous driving systems. The competition showed significant advancements in the field, with participants demonstrating models that excelled in semantic segmentation and prioritized safety and robustness in unstructured and adverse conditions. The results of the competition set new benchmarks in the domain, highlighting the critical role of safety in deploying autonomous vehicles in real-world scenarios. The contributions from this competition are expected to drive further innovation in autonomous driving technology, addressing the critical challenges of operating in diverse and unpredictable environments. △ Less

Submitted 9 September, 2024; originally announced September 2024.

Comments: 15 pages, 7 figures, ICPR Competition Paper

arXiv:2409.04204 [pdf, other]

doi 10.1364/JOSAB.541759

Twin-field-based multi-party quantum key agreement

Authors: Venkat Abhignan, R. Srikanth

Abstract: Quantum key distribution (QKD) can secure cryptographic communication between two distant users, as guaranteed by the laws of quantum mechanics rather than computational assumptions. The twin-field scheme, which employs counter-propagated weak coherent light pulses, doubles the secure distance of standard QKD without using quantum repeaters. Here, we study a method to extend the twin-field key dis… ▽ More Quantum key distribution (QKD) can secure cryptographic communication between two distant users, as guaranteed by the laws of quantum mechanics rather than computational assumptions. The twin-field scheme, which employs counter-propagated weak coherent light pulses, doubles the secure distance of standard QKD without using quantum repeaters. Here, we study a method to extend the twin-field key distribution protocol to a scheme for multi-party quantum key agreement. We study our protocol's security using a minimum error discrimination analysis and derive the asymptotic key rate based on the entanglement-based source-replacement scheme. We also simulate it on the ANSYS Interconnect platform with optical components to study the protocol's performance in certain practical situations. △ Less

Submitted 23 December, 2024; v1 submitted 6 September, 2024; originally announced September 2024.

Comments: Accepted in Journal of the Optical Society of America B (2024)

Journal ref: J. Opt. Soc. Am. B 42, 267-279 (2025)

arXiv:2409.03769 [pdf, other]

Representation Learning of Complex Assemblies, An Effort to Improve Corporate Scope 3 Emissions Calculation

Authors: Ajay Chatterjee, Srikanth Ranganathan

Abstract: Climate change is a pressing global concern for governments, corporations, and citizens alike. This concern underscores the necessity for these entities to accurately assess the climate impact of manufacturing goods and providing services. Tools like process life cycle analysis (pLCA) are used to evaluate the climate impact of production, use, and disposal, from raw material mining through end-of-… ▽ More Climate change is a pressing global concern for governments, corporations, and citizens alike. This concern underscores the necessity for these entities to accurately assess the climate impact of manufacturing goods and providing services. Tools like process life cycle analysis (pLCA) are used to evaluate the climate impact of production, use, and disposal, from raw material mining through end-of-life. pLCA further enables practitioners to look deeply into material choices or manufacturing processes for individual parts, sub-assemblies, assemblies, and the final product. Reliable and detailed data on the life cycle stages and processes of the product or service under study are not always available or accessible, resulting in inaccurate assessment of climate impact. To overcome the data limitation and enhance the effectiveness of pLCA to generate an improved environmental impact profile, we are adopting an innovative strategy to identify alternative parts, products, and components that share similarities in terms of their form, function, and performance to serve as qualified substitutes. Focusing on enterprise electronics hardware, we propose a semi-supervised learning-based framework to identify substitute parts that leverages product bill of material (BOM) data and a small amount of component-level qualified substitute data (positive samples) to generate machine knowledge graph (MKG) and learn effective embeddings of the components that constitute electronic hardware. Our methodology is grounded in attributed graph embeddings and introduces a strategy to generate biased negative samples to significantly enhance the training process. We demonstrate improved performance and generalization over existing published models. △ Less

Submitted 21 August, 2024; originally announced September 2024.

arXiv:2409.00856 [pdf, other]

Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages

Authors: William Zhang, Maria Leon, Ryan Xu, Adrian Cardenas, Amelia Wissink, Hanna Martin, Maya Srikanth, Kaya Dorogi, Christian Valadez, Pedro Perez, Citlalli Grijalva, Corey Zhang, Mark Santolucito

Abstract: Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive programming background. Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity. However, the best strategy for… ▽ More Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive programming background. Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity. However, the best strategy for code generation for visual node-based programming languages is still an open question. In particular, such languages have multiple levels of representation in text, each of which may be used for code generation. In this work, we explore the performance of LLM code generation in audio programming tasks in visual programming languages at multiple levels of representation. We explore code generation through metaprogramming code representations for these languages (i.e., coding the language using a different high-level text-based programming language), as well as through direct node generation with JSON. We evaluate code generated in this way for two visual languages for audio programming on a benchmark set of coding problems. We measure both correctness and complexity of the generated code. We find that metaprogramming results in more semantically correct generated code, given that the code is well-formed (i.e., is syntactically correct and runs). We also find that prompting for richer metaprogramming using randomness and loops led to more complex code. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2408.07892 [pdf, other]

Personhood credentials: Artificial intelligence and the value of privacy-preserving tools to distinguish who is real online

Authors: Steven Adler, Zoë Hitzig, Shrey Jain, Catherine Brewer, Wayne Chang, Renée DiResta, Eddy Lazzarin, Sean McGregor, Wendy Seltzer, Divya Siddarth, Nouran Soliman, Tobin South, Connor Spelliscy, Manu Sporny, Varya Srivastava, John Bailey, Brian Christian, Andrew Critch, Ronnie Falcon, Heather Flanagan, Kim Hamilton Duffy, Eric Ho, Claire R. Leibowicz, Srikanth Nadhamuni, Alan Z. Rozenshtein , et al. (7 additional authors not shown)

Abstract: Anonymity is an important principle online. However, malicious actors have long used misleading identities to conduct fraud, spread disinformation, and carry out other deceptive schemes. With the advent of increasingly capable AI, bad actors can amplify the potential scale and effectiveness of their operations, intensifying the challenge of balancing anonymity and trustworthiness online. In this p… ▽ More Anonymity is an important principle online. However, malicious actors have long used misleading identities to conduct fraud, spread disinformation, and carry out other deceptive schemes. With the advent of increasingly capable AI, bad actors can amplify the potential scale and effectiveness of their operations, intensifying the challenge of balancing anonymity and trustworthiness online. In this paper, we analyze the value of a new tool to address this challenge: "personhood credentials" (PHCs), digital credentials that empower users to demonstrate that they are real people -- not AIs -- to online services, without disclosing any personal information. Such credentials can be issued by a range of trusted institutions -- governments or otherwise. A PHC system, according to our definition, could be local or global, and does not need to be biometrics-based. Two trends in AI contribute to the urgency of the challenge: AI's increasing indistinguishability from people online (i.e., lifelike content and avatars, agentic activity), and AI's increasing scalability (i.e., cost-effectiveness, accessibility). Drawing on a long history of research into anonymous credentials and "proof-of-personhood" systems, personhood credentials give people a way to signal their trustworthiness on online platforms, and offer service providers new tools for reducing misuse by bad actors. In contrast, existing countermeasures to automated deception -- such as CAPTCHAs -- are inadequate against sophisticated AI, while stringent identity verification solutions are insufficiently private for many use-cases. After surveying the benefits of personhood credentials, we also examine deployment risks and design challenges. We conclude with actionable next steps for policymakers, technologists, and standards bodies to consider in consultation with the public. △ Less

Submitted 17 January, 2025; v1 submitted 14 August, 2024; originally announced August 2024.

Comments: 63 pages, 7 figures, 5 tables; minor additions to acknowledgments and wording changes for clarity; corrected typo; updated email address reference for author

arXiv:2408.06322 [pdf]

Discovering High-Entropy Oxides with a Machine-Learning Interatomic Potential

Authors: Jacob T. Sivak, Saeed S. I. Almishal, Mary K. Caucci, Yueze Tan, Dhiya Srikanth, Matthew Furst, Long-Quin Chen, Christina M. Rost, Jon-Paul Maria, Susan B. Sinnott

Abstract: High-entropy materials shift the traditional materials discovery paradigm to one that leverages disorder, enabling access to unique chemistries unreachable through enthalpy alone. We present a self-consistent approach integrating computation and experiment to understand and explore single-phase rock salt high-entropy oxides. By leveraging a machine-learning interatomic potential, we rapidly and ac… ▽ More High-entropy materials shift the traditional materials discovery paradigm to one that leverages disorder, enabling access to unique chemistries unreachable through enthalpy alone. We present a self-consistent approach integrating computation and experiment to understand and explore single-phase rock salt high-entropy oxides. By leveraging a machine-learning interatomic potential, we rapidly and accurately map high-entropy composition space using our two descriptors: bond length distribution and mixing enthalpy. The single-phase stabilities for all experimentally stabilized rock salt compositions are correctly resolved, with dozens more compositions awaiting discovery. △ Less

Submitted 13 January, 2025; v1 submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.04230 [pdf, other]

Enabling Communication via APIs for Mainframe Applications

Authors: Vini Kanvar, Srikanth Tamilselvam, Keerthi Narayan Raghunath

Abstract: For decades, mainframe systems have been vital in enterprise computing, supporting essential applications across industries like banking, retail, and healthcare. To harness these legacy applications and facilitate their reuse, there is increasing interest in using Application Programming Interfaces (APIs) to expose their data and functionalities, enabling the creation of new applications. However,… ▽ More For decades, mainframe systems have been vital in enterprise computing, supporting essential applications across industries like banking, retail, and healthcare. To harness these legacy applications and facilitate their reuse, there is increasing interest in using Application Programming Interfaces (APIs) to expose their data and functionalities, enabling the creation of new applications. However, identifying and exposing APIs for various business use cases presents significant challenges, including understanding legacy code, separating dependent components, introducing new artifacts, and making changes without disrupting functionality or compromising key Service Level Agreements (SLAs) like Turnaround Time (TAT). We address these challenges by proposing a novel framework for creating APIs for legacy mainframe applications. Our approach involves identifying APIs by compiling artifacts such as transactions, screens, control flow blocks, inter-microservice calls, business rules, and data accesses. We use static analyses like liveness and reaching definitions to traverse the code and automatically compute API signatures, which include request/response fields. To evaluate our framework, we conducted a qualitative survey with nine mainframe developers, averaging 15 years of experience. This survey helped identify candidate APIs and estimate development time for coding these APIs on a public mainframe application, GENAPP, and two industry mainframe applications. The results showed that our framework effectively identified more candidate APIs and reduced implementation time. The API signature computation is integrated into IBM Watsonx Code Assistant for Z Refactoring Assistant. We verified the correctness of the identified APIs by executing them on an IBM Z mainframe system, demonstrating the practical viability of our approach. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2407.17505 [pdf]

Survey on biomarkers in human vocalizations

Authors: Aki Härmä, Bert den Brinker, Ulf Grossekathofer, Okke Ouweltjes, Srikanth Nallanthighal, Sidharth Abrol, Vibhu Sharma

Abstract: Recent years has witnessed an increase in technologies that use speech for the sensing of the health of the talker. This survey paper proposes a general taxonomy of the technologies and a broad overview of current progress and challenges. Vocal biomarkers are often secondary measures that are approximating a signal of another sensor or identifying an underlying mental, cognitive, or physiological… ▽ More Recent years has witnessed an increase in technologies that use speech for the sensing of the health of the talker. This survey paper proposes a general taxonomy of the technologies and a broad overview of current progress and challenges. Vocal biomarkers are often secondary measures that are approximating a signal of another sensor or identifying an underlying mental, cognitive, or physiological state. Their measurement involve disturbances and uncertainties that may be considered as noise sources and the biomarkers are coarsely qualified in terms of the various sources of noise involved in their determination. While in some proposed biomarkers the error levels seem high, there are vocal biomarkers where the errors are expected to be low and thus are more likely to qualify as candidates for adoption in healthcare applications. △ Less

Submitted 8 August, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.12955 [pdf]

Symmetry-breaking of turbulent flow due to asymmetric vortex shedding in periodic porous media

Authors: Vishal Srikanth, Andrey V. Kuznetsov

Abstract: In this paper, we report new insight into a symmetry-breaking phenomenon that occurs for turbulent flow in periodic porous media composed of cylindrical solid obstacles with circular cross-section. We have used Large Eddy Simulation to investigate the symmetry-breaking phenomenon by varying the porosity (0.57-0.99) and the pore scale Reynolds number (37-1,000). Asymmetrical flow distribution is ob… ▽ More In this paper, we report new insight into a symmetry-breaking phenomenon that occurs for turbulent flow in periodic porous media composed of cylindrical solid obstacles with circular cross-section. We have used Large Eddy Simulation to investigate the symmetry-breaking phenomenon by varying the porosity (0.57-0.99) and the pore scale Reynolds number (37-1,000). Asymmetrical flow distribution is observed in the intermediate porosity flow regime for values of porosities between 0.8 and 0.9, which is characterized by the formation of alternating low and high velocity flow channels above and below the solid obstacles. These channels are parallel to the direction of the flow. Correspondingly, the microscale vortices formed behind the solid obstacles exhibit a bias in the shedding direction. The transition from symmetric to asymmetric flow occurs in between the Reynolds numbers of 37 (laminar) and 100 (turbulent). A Hopf bifurcation resulting in unsteady oscillatory laminar flow marks the origin of a secondary flow instability arising from the interaction of the shear layers around the solid obstacle. When turbulence emerges, stochastic phase difference in the vortex wake oscillations caused by the secondary flow instability results in flow symmetry breaking. We note that symmetry breaking does not occur for cylindrical solid obstacles with square cross-section due to the presence of sharp vertices in the solid obstacle surface. At the macroscale level, symmetry-breaking results in residual transverse drag force components acting on the solid obstacle surfaces. Symmetry-breaking promotes attached flow on the solid obstacle surface, which is potentially beneficial for improving transport properties at the solid obstacle surface such as convection heat flux. △ Less

Submitted 21 February, 2025; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: 19 pages, 10 figures, 3 pages supplementary material at the end of document

arXiv:2407.11139 [pdf, other]

doi 10.1051/0004-6361/202449666

Globular cluster orbital decay in dwarf galaxies with MOND and CDM: Impact of supernova feedback

Authors: M. Bílek, F. Combes, S. T. Nagesh, M. Hilker

Abstract: Dynamical friction works very differently for Newtonian gravity with dark matter and in modified Newtonian dynamics (MOND). While the absence of dark matter considerably reduces the friction in major galaxy mergers, analytic calculations indicate the opposite for very small perturbations, such as globular clusters (GCs) sinking in dwarf galaxies. Here, we study the decay of GCs in isolated gas-ric… ▽ More Dynamical friction works very differently for Newtonian gravity with dark matter and in modified Newtonian dynamics (MOND). While the absence of dark matter considerably reduces the friction in major galaxy mergers, analytic calculations indicate the opposite for very small perturbations, such as globular clusters (GCs) sinking in dwarf galaxies. Here, we study the decay of GCs in isolated gas-rich dwarf galaxies using simulations with the Phantom of Ramses code, which enables both the Newtonian and the QUMOND MOND gravity. We modeled the GCs as point masses, and we simulated the full hydrodynamics, with star formation and supernovae feedback. We explored whether the fluctuations in gravitational potential caused by the supernovae can prevent GCs from sinking toward the nucleus. For GCs of typical mass or lighter, we find that this indeed works in both Newtonian and MOND simulations. The GC can even make a random walk. However, we find that supernovae cannot prevent massive GCs ($M\geq 4\times10^5\,M_\odot$) from sinking in MOND. The resulting object looks similar to a galaxy with an offset core, which embeds the sunk GC. The problem is much milder in the Newtonian simulations. This result thus favors Newtonian over QUMOND gravity, but we note that it relies on the correctness of the difficult modeling of baryonic feedback. We propose that the fluctuations in the gravitational potential could be responsible for the thickness of the stellar disks of dwarf galaxies and that strong supernova winds in modified gravity can transform dwarf galaxies into ultra-diffuse galaxies. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 19 pages, 14 figures, 8 tables. Accepted for publication in A&A

Journal ref: A&A 690, A119 (2024)

arXiv:2407.10624 [pdf, other]

Exploring the Dyson Ring: Parameters, Stability and Helical Orbit

Authors: Teerth Raval, Dhruv Srikanth

Abstract: A Dyson ring is a hypothetical megastructure, that a very advanced civilization would build around a star to harness more of its energy. Satellite propagation is of high priority in such a vast world where distances could very well be measured in astronomical units. We analyze the ring's parameters and stability and propose a stable helical orbit around the Dyson ring influenced by the gravity of… ▽ More A Dyson ring is a hypothetical megastructure, that a very advanced civilization would build around a star to harness more of its energy. Satellite propagation is of high priority in such a vast world where distances could very well be measured in astronomical units. We analyze the ring's parameters and stability and propose a stable helical orbit around the Dyson ring influenced by the gravity of the Dyson ring and the Sun. Taking theoretically explainable values for all parameters, we describe our approach to finding this orbit and present the successful simulation of a satellite's flight in this path. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 5 pages, accepted to IEEE SPACE 2024, compliant with EDAS margins

arXiv:2407.06706 [pdf, other]

Exploring Unstructured Environments using Minimal Sensing on Cooperative Nano-Drones

Authors: Pedro Arias-Perez, Alvika Gautam, Miguel Fernandez-Cortizas, David Perez-Saura, Srikanth Saripalli, Pascual Campoy

Abstract: Recent advances have improved autonomous navigation and mapping under payload constraints, but current multi-robot inspection algorithms are unsuitable for nano-drones due to their need for heavy sensors and high computational resources. To address these challenges, we introduce ExploreBug, a novel hybrid frontier range bug algorithm designed to handle limited sensing capabilities for a swarm of n… ▽ More Recent advances have improved autonomous navigation and mapping under payload constraints, but current multi-robot inspection algorithms are unsuitable for nano-drones due to their need for heavy sensors and high computational resources. To address these challenges, we introduce ExploreBug, a novel hybrid frontier range bug algorithm designed to handle limited sensing capabilities for a swarm of nano-drones. This system includes three primary components: a mapping subsystem, an exploration subsystem, and a navigation subsystem. Additionally, an intra-swarm collision avoidance system is integrated to prevent collisions between drones. We validate the efficacy of our approach through extensive simulations and real-world exploration experiments involving up to seven drones in simulations and three in real-world settings, across various obstacle configurations and with a maximum navigation speed of 0.75 m/s. Our tests demonstrate that the algorithm efficiently completes exploration tasks, even with minimal sensing, across different swarm sizes and obstacle densities. Furthermore, our frontier allocation heuristic ensures an equal distribution of explored areas and paths traveled by each drone in the swarm. We publicly release the source code of the proposed system to foster further developments in mapping and exploration using autonomous nano drones. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: Submitted to IEEE Robotics and Automation Letters

Showing 51–100 of 950 results for author: Srikanth