Search | arXiv e-print repository

Understanding Refusal in Language Models with Sparse Autoencoders

Authors: Wei Jie Yeo, Nirmalendu Prakash, Clement Neo, Roy Ka-Wei Lee, Erik Cambria, Ranjan Satapathy

Abstract: Refusal is a key safety behavior in aligned language models, yet the internal mechanisms driving refusals remain opaque. In this work, we conduct a mechanistic study of refusal in instruction-tuned LLMs using sparse autoencoders to identify latent features that causally mediate refusal behaviors. We apply our method to two open-source chat models and intervene on refusal-related features to assess… ▽ More Refusal is a key safety behavior in aligned language models, yet the internal mechanisms driving refusals remain opaque. In this work, we conduct a mechanistic study of refusal in instruction-tuned LLMs using sparse autoencoders to identify latent features that causally mediate refusal behaviors. We apply our method to two open-source chat models and intervene on refusal-related features to assess their influence on generation, validating their behavioral impact across multiple harmful datasets. This enables a fine-grained inspection of how refusal manifests at the activation level and addresses key research questions such as investigating upstream-downstream latent relationship and understanding the mechanisms of adversarial jailbreaking techniques. We also establish the usefulness of refusal features in enhancing generalization for linear probes to out-of-distribution adversarial samples in classification tasks. We open source our code in https://github.com/wj210/refusal_sae. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.14685 [pdf, ps, other]

Language Models use Lookbacks to Track Beliefs

Authors: Nikhil Prakash, Natalie Shapira, Arnab Sen Sharma, Christoph Riedl, Yonatan Belinkov, Tamar Rott Shaham, David Bau, Atticus Geiger

Abstract: How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze Llama-3-70B-Instruct's ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset that consists of simple stories where two charact… ▽ More How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilities of LMs. We analyze Llama-3-70B-Instruct's ability to reason about characters' beliefs using causal mediation and abstraction. We construct a dataset that consists of simple stories where two characters each separately change the state of two objects, potentially unaware of each other's actions. Our investigation uncovered a pervasive algorithmic pattern that we call a lookback mechanism, which enables the LM to recall important information when it becomes necessary. The LM binds each character-object-state triple together by co-locating reference information about them, represented as their Ordering IDs (OIs) in low rank subspaces of the state token's residual stream. When asked about a character's beliefs regarding the state of an object, the binding lookback retrieves the corresponding state OI and then an answer lookback retrieves the state token. When we introduce text specifying that one character is (not) visible to the other, we find that the LM first generates a visibility ID encoding the relation between the observing and the observed character OIs. In a visibility lookback, this ID is used to retrieve information about the observed character and update the observing character's beliefs. Our work provides insights into the LM's belief tracking mechanisms, taking a step toward reverse-engineering ToM reasoning in LMs. △ Less

Submitted 20 May, 2025; originally announced May 2025.

Comments: 32 pages, 32 figures. Code and data at https://belief.baulab.info/

arXiv:2504.17080 [pdf, other]

Geometric Formulation of Unified Force-Impedance Control on SE(3) for Robotic Manipulators

Authors: Joohwan Seo, Nikhil Potu Surya Prakash, Soomi Lee, Arvind Kruthiventy, Megan Teng, Jongeun Choi, Roberto Horowitz

Abstract: In this paper, we present an impedance control framework on the SE(3) manifold, which enables force tracking while guaranteeing passivity. Building upon the unified force-impedance control (UFIC) and our previous work on geometric impedance control (GIC), we develop the geometric unified force impedance control (GUFIC) to account for the SE(3) manifold structure in the controller formulation using… ▽ More In this paper, we present an impedance control framework on the SE(3) manifold, which enables force tracking while guaranteeing passivity. Building upon the unified force-impedance control (UFIC) and our previous work on geometric impedance control (GIC), we develop the geometric unified force impedance control (GUFIC) to account for the SE(3) manifold structure in the controller formulation using a differential geometric perspective. As in the case of the UFIC, the GUFIC utilizes energy tank augmentation for both force-tracking and impedance control to guarantee the manipulator's passivity relative to external forces. This ensures that the end effector maintains safe contact interaction with uncertain environments and tracks a desired interaction force. Moreover, we resolve a non-causal implementation problem in the UFIC formulation by introducing velocity and force fields. Due to its formulation on SE(3), the proposed GUFIC inherits the desirable SE(3) invariance and equivariance properties of the GIC, which helps increase sample efficiency in machine learning applications where a learning algorithm is incorporated into the control law. The proposed control law is validated in a simulation environment under scenarios requiring tracking an SE(3) trajectory, incorporating both position and orientation, while exerting a force on a surface. The codes are available at https://github.com/Joohwan-Seo/GUFIC_mujoco. △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: Submitted to Control Decision Conference (CDC) 2025

arXiv:2504.13151 [pdf, ps, other]

MIB: A Mechanistic Interpretability Benchmark

Authors: Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov

Abstract: How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization… ▽ More How can we know whether new mechanistic interpretability methods achieve real improvements? In pursuit of lasting evaluation standards, we propose MIB, a Mechanistic Interpretability Benchmark, with two tracks spanning four tasks and five models. MIB favors methods that precisely and concisely recover relevant causal pathways or causal variables in neural language models. The circuit localization track compares methods that locate the model components - and connections between them - most important for performing a task (e.g., attribution patching or information flow routes). The causal variable localization track compares methods that featurize a hidden vector, e.g., sparse autoencoders (SAEs) or distributed alignment search (DAS), and align those features to a task-relevant causal variable. Using MIB, we find that attribution and mask optimization methods perform best on circuit localization. For causal variable localization, we find that the supervised DAS method performs best, while SAE features are not better than neurons, i.e., non-featurized hidden vectors. These findings illustrate that MIB enables meaningful comparisons, and increases our confidence that there has been real progress in the field. △ Less

Submitted 9 June, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: Accepted to ICML 2025. Project website at https://mib-bench.github.io

arXiv:2503.04429 [pdf, ps, other]

Activation Space Interventions Can Be Transferred Between Large Language Models

Authors: Narmeen Oozeer, Dhruv Nathawani, Nirmalendu Prakash, Michael Lan, Abir Harrasse, Amirali Abdullah

Abstract: The study of representation universality in AI models reveals growing convergence across domains, modalities, and architectures. However, the practical applications of representation universality remain largely unexplored. We bridge this gap by demonstrating that safety interventions can be transferred between models through learned mappings of their shared activation spaces. We demonstrate this a… ▽ More The study of representation universality in AI models reveals growing convergence across domains, modalities, and architectures. However, the practical applications of representation universality remain largely unexplored. We bridge this gap by demonstrating that safety interventions can be transferred between models through learned mappings of their shared activation spaces. We demonstrate this approach on two well-established AI safety tasks: backdoor removal and refusal of harmful prompts, showing successful transfer of steering vectors that alter the models' outputs in a predictable way. Additionally, we propose a new task, \textit{corrupted capabilities}, where models are fine-tuned to embed knowledge tied to a backdoor. This tests their ability to separate useful skills from backdoors, reflecting real-world challenges. Extensive experiments across Llama, Qwen and Gemma model families show that our method enables using smaller models to efficiently align larger ones. Furthermore, we demonstrate that autoencoder mappings between base and fine-tuned models can serve as reliable ``lightweight safety switches", allowing dynamic toggling between model behaviors. △ Less

Submitted 16 June, 2025; v1 submitted 6 March, 2025; originally announced March 2025.

Comments: 75 pages

arXiv:2408.01416 [pdf, other]

The Quest for the Right Mediator: A History, Survey, and Theoretical Grounding of Causal Interpretability

Authors: Aaron Mueller, Jannik Brinkmann, Millicent Li, Samuel Marks, Koyena Pal, Nikhil Prakash, Can Rager, Aruna Sankaranarayanan, Arnab Sen Sharma, Jiuding Sun, Eric Todd, David Bau, Yonatan Belinkov

Abstract: Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the… ▽ More Interpretability provides a toolset for understanding how and why neural networks behave in certain ways. However, there is little unity in the field: most studies employ ad-hoc evaluations and do not share theoretical foundations, making it difficult to measure progress and compare the pros and cons of different techniques. Furthermore, while mechanistic understanding is frequently discussed, the basic causal units underlying these mechanisms are often not explicitly defined. In this paper, we propose a perspective on interpretability research grounded in causal mediation analysis. Specifically, we describe the history and current state of interpretability taxonomized according to the types of causal units (mediators) employed, as well as methods used to search over mediators. We discuss the pros and cons of each mediator, providing insights as to when particular kinds of mediators and search methods are most appropriate depending on the goals of a given study. We argue that this framing yields a more cohesive narrative of the field, as well as actionable insights for future work. Specifically, we recommend a focus on discovering new mediators with better trade-offs between human-interpretability and compute-efficiency, and which can uncover more sophisticated abstractions from neural networks than the primarily linear mediators employed in current work. We also argue for more standardized evaluations that enable principled comparisons across mediator types, such that we can better understand when particular causal units are better suited to particular use cases. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.14561 [pdf, other]

NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals

Authors: Jaden Fiotto-Kaufman, Alexander R. Loftus, Eric Todd, Jannik Brinkmann, Koyena Pal, Dmitrii Troitskii, Michael Ripa, Adam Belfki, Can Rager, Caden Juang, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Nikhil Prakash, Carla Brodley, Arjun Guha, Jonathan Bell, Byron C. Wallace, David Bau

Abstract: We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU re… ▽ More We introduce NNsight and NDIF, technologies that work in tandem to enable scientific study of the representations and computations learned by very large neural networks. NNsight is an open-source system that extends PyTorch to introduce deferred remote execution. The National Deep Inference Fabric (NDIF) is a scalable inference service that executes NNsight requests, allowing users to share GPU resources and pretrained models. These technologies are enabled by the Intervention Graph, an architecture developed to decouple experimental design from model runtime. Together, this framework provides transparent and efficient access to the internals of deep neural networks such as very large language models (LLMs) without imposing the cost or complexity of hosting customized models individually. We conduct a quantitative survey of the machine learning literature that reveals a growing gap in the study of the internals of large-scale AI. We demonstrate the design and use of our framework to address this gap by enabling a range of research methods on huge models. Finally, we conduct benchmarks to compare performance with previous approaches. Code, documentation, and tutorials are available at https://nnsight.net/. △ Less

Submitted 1 April, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

Comments: Code at https://nnsight.net

arXiv:2407.13090 [pdf]

Enhanced Denoising of Optical Coherence Tomography Images Using Residual U-Net

Authors: Akkidas Noel Prakash, Jahnvi Sai Ganta, Ramaswami Krishnadas, Tin A. Tunc, Satish K Panda

Abstract: Optical Coherence Tomography (OCT) imaging is pivotal in diagnosing ophthalmic conditions by providing detailed cross-sectional images of the anterior and posterior segments of the eye. Nonetheless, speckle noise and other imaging artifacts inherent to OCT impede the accuracy of diagnosis significantly. In this study, we proposed an enhanced denoising model using a Residual U-Net architecture that… ▽ More Optical Coherence Tomography (OCT) imaging is pivotal in diagnosing ophthalmic conditions by providing detailed cross-sectional images of the anterior and posterior segments of the eye. Nonetheless, speckle noise and other imaging artifacts inherent to OCT impede the accuracy of diagnosis significantly. In this study, we proposed an enhanced denoising model using a Residual U-Net architecture that effectively diminishes noise and improves image clarity across both Anterior Segment OCT (ASOCT) and polarization-sensitive OCT (PSOCT) images. Our approach demonstrated substantial improvements in image quality metrics: the Peak Signal Noise Ratio (PSNR) was 34.343 $\pm$ 1.113 for PSOCT images, and Structural Similarity Index Measure (SSIM) values were 0.885 $\pm$ 0.030, indicating enhanced preservation of tissue integrity and textural details. For ASOCT images, we observed the PSNR to be 23.525 $\pm$ 0.872 dB and SSIM 0.407 $\pm$ 0.044, reflecting significant enhancements in visual quality and structural accuracy. These metrics substantiate the models efficacy in not only reducing noise but also in maintaining crucial anatomical features, thereby enabling more precise and efficient clinical evaluations. The dual functionality across both ASOCT and PSOCT modalities underscores the versatility and potential for broad application in clinical settings, optimizing diagnostic processes and reducing the necessity for prolonged imaging sessions. △ Less

Submitted 24 September, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

arXiv:2406.12347 [pdf, other]

Interpreting Bias in Large Language Models: A Feature-Based Approach

Authors: Nirmalendu Prakash, Lee Ka Wei Roy

Abstract: Large Language Models (LLMs) such as Mistral and LLaMA have showcased remarkable performance across various natural language processing (NLP) tasks. Despite their success, these models inherit social biases from the diverse datasets on which they are trained. This paper investigates the propagation of biases within LLMs through a novel feature-based analytical approach. Drawing inspiration from ca… ▽ More Large Language Models (LLMs) such as Mistral and LLaMA have showcased remarkable performance across various natural language processing (NLP) tasks. Despite their success, these models inherit social biases from the diverse datasets on which they are trained. This paper investigates the propagation of biases within LLMs through a novel feature-based analytical approach. Drawing inspiration from causal mediation analysis, we hypothesize the evolution of bias-related features and validate them using interpretability techniques like activation and attribution patching. Our contributions are threefold: (1) We introduce and empirically validate a feature-based method for bias analysis in LLMs, applied to LLaMA-2-7B, LLaMA-3-8B, and Mistral-7B-v0.3 with templates from a professions dataset. (2) We extend our method to another form of gender bias, demonstrating its generalizability. (3) We differentiate the roles of MLPs and attention heads in bias propagation and implement targeted debiasing using a counterfactual dataset. Our findings reveal the complex nature of bias in LLMs and emphasize the necessity for tailored debiasing strategies, offering a deeper understanding of bias mechanisms and pathways for effective mitigation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.01842 [pdf, ps, other]

SGHateCheck: Functional Tests for Detecting Hate Speech in Low-Resource Languages of Singapore

Authors: Ri Chi Ng, Nirmalendu Prakash, Ming Shan Hee, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Abstract: To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native ann… ▽ More To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2402.14811 [pdf, other]

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Authors: Nikhil Prakash, Tamar Rott Shaham, Tal Haklay, Yonatan Belinkov, David Bau

Abstract: Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, w… ▽ More Fine-tuning on generalized tasks such as instruction following, code generation, and mathematics has been shown to enhance language models' performance on a range of tasks. Nevertheless, explanations of how such fine-tuning influences the internal computations in these models remain elusive. We study how fine-tuning affects the internal mechanisms implemented in language models. As a case study, we explore the property of entity tracking, a crucial facet of language comprehension, where models fine-tuned on mathematics have substantial performance gains. We identify the mechanism that enables entity tracking and show that (i) in both the original model and its fine-tuned versions primarily the same circuit implements entity tracking. In fact, the entity tracking circuit of the original model on the fine-tuned versions performs better than the full original model. (ii) The circuits of all the models implement roughly the same functionality: Entity tracking is performed by tracking the position of the correct entity in both the original model and its fine-tuned versions. (iii) Performance boost in the fine-tuned models is primarily attributed to its improved ability to handle the augmented positional information. To uncover these findings, we employ: Patch Patching, DCM, which automatically detects model components responsible for specific semantics, and CMAP, a new approach for patching activations across models to reveal improved mechanisms. Our findings suggest that fine-tuning enhances, rather than fundamentally alters, the mechanistic operation of the model. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: ICLR 2024. 26 pages, 13 figures. Code and data at https://finetuning.baulab.info/

arXiv:2401.13190 [pdf, other]

A Comparison Between Lie Group- and Lie Algebra- Based Potential Functions for Geometric Impedance Control

Authors: Joohwan Seo, Nikhil Potu Surya Prakash, Jongeun Choi, Roberto Horowitz

Abstract: In this paper, a comparison analysis between geometric impedance controls (GICs) derived from two different potential functions on SE(3) for robotic manipulators is presented. The first potential function is defined on the Lie group, utilizing the Frobenius norm of the configuration error matrix. The second potential function is defined utilizing the Lie algebra, i.e., log-map of the configuration… ▽ More In this paper, a comparison analysis between geometric impedance controls (GICs) derived from two different potential functions on SE(3) for robotic manipulators is presented. The first potential function is defined on the Lie group, utilizing the Frobenius norm of the configuration error matrix. The second potential function is defined utilizing the Lie algebra, i.e., log-map of the configuration error. Using a differential geometric approach, the detailed derivation of the distance metric and potential function on SE(3) is introduced. The GIC laws are respectively derived from the two potential functions, followed by extensive comparison analyses. In the qualitative analysis, the properties of the error function and control laws are analyzed, while the performances of the controllers are quantitatively compared using numerical simulation. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: This paper is accepted to American Control Conference (ACC) 2024

arXiv:2312.09693 [pdf, other]

Prompting Large Language Models for Topic Modeling

Authors: Han Wang, Nirmalendu Prakash, Nguyen Khoi Hoang, Ming Shan Hee, Usman Naseem, Roy Ka-Wei Lee

Abstract: Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words. Moreover, these models often neglect sentence-level semantics, focusing primarily on token-level semantics. In this paper, we propose PromptTopic, a novel topic… ▽ More Topic modeling is a widely used technique for revealing underlying thematic structures within textual data. However, existing models have certain limitations, particularly when dealing with short text datasets that lack co-occurring words. Moreover, these models often neglect sentence-level semantics, focusing primarily on token-level semantics. In this paper, we propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs) to address these challenges. It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths. This approach eliminates the need for manual parameter tuning and improves the quality of extracted topics. We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics. Furthermore, qualitative analysis showcases PromptTopic's ability to uncover relevant topics in multiple datasets. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 6 pages, 3 figures, IEEE International Conference on Big Data

ACM Class: I.2.7

arXiv:2312.06094 [pdf, other]

doi 10.1145/3581783.3613463

MATK: The Meme Analytical Tool Kit

Authors: Ming Shan Hee, Aditi Kumaresan, Nguyen Khoi Hoang, Nirmalendu Prakash, Rui Cao, Roy Ka-Wei Lee

Abstract: The rise of social media platforms has brought about a new digital culture called memes. Memes, which combine visuals and text, can strongly influence public opinions on social and cultural issues. As a result, people have become interested in categorizing memes, leading to the development of various datasets and multimodal models that show promising results in this field. However, there is curren… ▽ More The rise of social media platforms has brought about a new digital culture called memes. Memes, which combine visuals and text, can strongly influence public opinions on social and cultural issues. As a result, people have become interested in categorizing memes, leading to the development of various datasets and multimodal models that show promising results in this field. However, there is currently a lack of a single library that allows for the reproduction, evaluation, and comparison of these models using fair benchmarks and settings. To fill this gap, we introduce the Meme Analytical Tool Kit (MATK), an open-source toolkit specifically designed to support existing memes datasets and cutting-edge multimodal models. MATK aims to assist researchers and engineers in training and reproducing these multimodal models for meme classification tasks, while also providing analysis techniques to gain insights into their strengths and weaknesses. To access MATK, please visit \url{https://github.com/Social-AI-Studio/MATK}. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted at ACM Multimedia'23 Open-Source Software Competition Track

ACM Class: I.1.4

arXiv:2312.06093 [pdf, other]

doi 10.1145/3581783.3613836

PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models

Authors: Nirmalendu Prakash, Han Wang, Nguyen Khoi Hoang, Ming Shan Hee, Roy Ka-Wei Lee

Abstract: The proliferation of social media has given rise to a new form of communication: memes. Memes are multimodal and often contain a combination of text and visual elements that convey meaning, humor, and cultural significance. While meme analysis has been an active area of research, little work has been done on unsupervised multimodal topic modeling of memes, which is important for content moderation… ▽ More The proliferation of social media has given rise to a new form of communication: memes. Memes are multimodal and often contain a combination of text and visual elements that convey meaning, humor, and cultural significance. While meme analysis has been an active area of research, little work has been done on unsupervised multimodal topic modeling of memes, which is important for content moderation, social media analysis, and cultural studies. We propose \textsf{PromptMTopic}, a novel multimodal prompt-based model designed to learn topics from both text and visual modalities by leveraging the language modeling capabilities of large language models. Our model effectively extracts and clusters topics learned from memes, considering the semantic interaction between the text and visual modalities. We evaluate our proposed model through extensive experiments on three real-world meme datasets, which demonstrate its superiority over state-of-the-art topic modeling baselines in learning descriptive topics in memes. Additionally, our qualitative analysis shows that \textsf{PromptMTopic} can identify meaningful and culturally relevant topics from memes. Our work contributes to the understanding of the topics and themes of memes, a crucial form of communication in today's society.\\ \red{\textbf{Disclaimer: This paper contains sensitive content that may be disturbing to some readers.}} △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: Accepted at ACM Multimedia'23 Research Track

ACM Class: I.1.4; I.1.7

arXiv:2311.10322 [pdf, other]

Clustering Techniques for Stable Linear Dynamical Systems with applications to Hard Disk Drives

Authors: Nikhil Potu Surya Prakash, Joohwan Seo, Jongeun Choi, Roberto Horowitz

Abstract: In Robust Control and Data Driven Robust Control design methodologies, multiple plant transfer functions or a family of transfer functions are considered and a common controller is designed such that all the plants that fall into this family are stabilized. Though the plants are stabilized, the controller might be sub-optimal for each of the plants when the variations in the plants are large. This… ▽ More In Robust Control and Data Driven Robust Control design methodologies, multiple plant transfer functions or a family of transfer functions are considered and a common controller is designed such that all the plants that fall into this family are stabilized. Though the plants are stabilized, the controller might be sub-optimal for each of the plants when the variations in the plants are large. This paper presents a way of clustering stable linear dynamical systems for the design of robust controllers within each of the clusters such that the controllers are optimal for each of the clusters. First a k-medoids algorithm for hard clustering will be presented for stable Linear Time Invariant (LTI) systems and then a Gaussian Mixture Models (GMM) clustering for a special class of LTI systems, common for Hard Disk Drive plants, will be presented. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 6 pages, 4 figures

arXiv:2310.12609 [pdf, ps, other]

Denoising Heat-inspired Diffusion with Insulators for Collision Free Motion Planning

Authors: Junwoo Chang, Hyunwoo Ryu, Jiwoo Kim, Soochul Yoo, Jongeun Choi, Joohwan Seo, Nikhil Prakash, Roberto Horowitz

Abstract: Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans… ▽ More Diffusion models have risen as a powerful tool in robotics due to their flexibility and multi-modality. While some of these methods effectively address complex problems, they often depend heavily on inference-time obstacle detection and require additional equipment. Addressing these challenges, we present a method that, during inference time, simultaneously generates only reachable goals and plans motions that avoid obstacles, all from a single visual input. Central to our approach is the novel use of a collision-avoiding diffusion kernel for training. Through evaluations against behavior-cloning and classical diffusion models, our framework has proven its robustness. It is particularly effective in multi-modal environments, navigating toward goals and avoiding unreachable ones blocked by obstacles, while ensuring collision avoidance. Project Website: https://sites.google.com/view/denoising-heat-inspired △ Less

Submitted 12 February, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: 9 pages, 6 figures

Journal ref: NeurIPS 2023 Workshop on Diffusion Models

arXiv:2308.14984 [pdf, other]

doi 10.1109/LRA.2023.3346748

Contact-rich SE(3)-Equivariant Robot Manipulation Task Learning via Geometric Impedance Control

Authors: Joohwan Seo, Nikhil Potu Surya Prakash, Xiang Zhang, Changhao Wang, Jongeun Choi, Masayoshi Tomizuka, Roberto Horowitz

Abstract: This paper presents a differential geometric control approach that leverages SE(3) group invariance and equivariance to increase transferability in learning robot manipulation tasks that involve interaction with the environment. Specifically, we employ a control law and a learning representation framework that remain invariant under arbitrary SE(3) transformations of the manipulation task definiti… ▽ More This paper presents a differential geometric control approach that leverages SE(3) group invariance and equivariance to increase transferability in learning robot manipulation tasks that involve interaction with the environment. Specifically, we employ a control law and a learning representation framework that remain invariant under arbitrary SE(3) transformations of the manipulation task definition. Furthermore, the control law and learning representation framework are shown to be SE(3) equivariant when represented relative to the spatial frame. The proposed approach is based on utilizing a recently presented geometric impedance control (GIC) combined with a learning variable impedance control framework, where the gain scheduling policy is trained in a supervised learning fashion from expert demonstrations. A geometrically consistent error vector (GCEV) is fed to a neural network to achieve a gain scheduling policy that remains invariant to arbitrary translation and rotations. A comparison of our proposed control and learning framework with a well-known Cartesian space learning impedance control, equipped with a Cartesian error vector-based gain scheduling policy, confirms the significantly superior learning transferability of our proposed approach. A hardware implementation on a peg-in-hole task is conducted to validate the learning transferability and feasibility of the proposed approach. △ Less

Submitted 18 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

arXiv:2307.03637 [pdf, other]

Discovering Variable Binding Circuitry with Desiderata

Authors: Xander Davies, Max Nadeau, Nikhil Prakash, Tamar Rott Shaham, David Bau

Abstract: Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits. Here, we introduce an approach which extends causal mediation experiments to automatically identify model components responsible for performing a specific subtask by solely specifying a set of \textit{deside… ▽ More Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits. Here, we introduce an approach which extends causal mediation experiments to automatically identify model components responsible for performing a specific subtask by solely specifying a set of \textit{desiderata}, or causal attributes of the model components executing that subtask. As a proof of concept, we apply our method to automatically discover shared \textit{variable binding circuitry} in LLaMA-13B, which retrieves variable values for multiple arithmetic tasks. Our method successfully localizes variable binding to only 9 attention heads (of the 1.6k) and one MLP in the final token's residual stream. △ Less

Submitted 7 July, 2023; originally announced July 2023.

arXiv:2305.17911 [pdf, other]

doi 10.1145/3587819.3592545

TotalDefMeme: A Multi-Attribute Meme dataset on Total Defence in Singapore

Authors: Nirmalendu Prakash, Ming Shan Hee, Roy Ka-Wei Lee

Abstract: Total Defence is a defence policy combining and extending the concept of military defence and civil defence. While several countries have adopted total defence as their defence policy, very few studies have investigated its effectiveness. With the rapid proliferation of social media and digitalisation, many social studies have been focused on investigating policy effectiveness through specially cu… ▽ More Total Defence is a defence policy combining and extending the concept of military defence and civil defence. While several countries have adopted total defence as their defence policy, very few studies have investigated its effectiveness. With the rapid proliferation of social media and digitalisation, many social studies have been focused on investigating policy effectiveness through specially curated surveys and questionnaires either through digital media or traditional forms. However, such references may not truly reflect the underlying sentiments about the target policies or initiatives of interest. People are more likely to express their sentiment using communication mediums such as starting topic thread on forums or sharing memes on social media. Using Singapore as a case reference, this study aims to address this research gap by proposing TotalDefMeme, a large-scale multi-modal and multi-attribute meme dataset that captures public sentiments toward Singapore's Total Defence policy. Besides supporting social informatics and public policy analysis of the Total Defence policy, TotalDefMeme can also support many downstream multi-modal machine learning tasks, such as aspect-based stance classification and multi-modal meme clustering. We perform baseline machine learning experiments on TotalDefMeme and evaluate its technical validity, and present possible future interdisciplinary research directions and application scenarios using the dataset as a baseline. △ Less

Submitted 29 May, 2023; originally announced May 2023.

Comments: 6 pages. Accepted at ACM MMSys 2023

ACM Class: I.2.7

arXiv:2304.00720 [pdf, other]

Data-Driven Track Following Control for Dual Stage-Actuator Hard Disk Drives

Authors: Nikhil Potu Surya Prakash, Joohwan Seo, Alexander Rose, Roberto Horowitz

Abstract: In this paper, we present a frequency domain data-driven feedback control design methodology for the design of tracking controllers for hard disk drives with two-stage actuator as a part of the open invited track 'Benchmark Problem on Control System Design of Hard Disk Drive with a Dual-Stage Actuator' in the IFAC World Congress 2023 (Yokohoma, Japan). The benchmark models are Compared to the trad… ▽ More In this paper, we present a frequency domain data-driven feedback control design methodology for the design of tracking controllers for hard disk drives with two-stage actuator as a part of the open invited track 'Benchmark Problem on Control System Design of Hard Disk Drive with a Dual-Stage Actuator' in the IFAC World Congress 2023 (Yokohoma, Japan). The benchmark models are Compared to the traditional controller design, we improve robustness and avoid model mismatch by using multiple frequency response plant measurements directly instead of plant models. Disturbance rejection and corresponding error minimization is posed as an H2 norm minimization problem with H infinity and H2 norm constraints. H infinity norm constraints are used to shape the closed loop transfer functions and ensure closed loop stability and H2 norm constraints are used to constrain and/or minimize the variance of relevant. △ Less

Submitted 3 April, 2023; originally announced April 2023.

Comments: 7 pages, 10 figures, IFAC World Congress, Yokohoma

arXiv:2211.07945 [pdf, ps, other]

doi 10.1016/j.ifacol.2023.10.1581

Geometric Impedance Control on SE(3) for Robotic Manipulators

Authors: Joohwan Seo, Nikhil Potu Surya Prakash, Alexander Rose, Jongeun Choi, Roberto Horowitz

Abstract: After its introduction, impedance control has been utilized as a primary control scheme for robotic manipulation tasks that involve interaction with unknown environments. While impedance control has been extensively studied, the geometric structure of SE(3) for the robotic manipulator itself and its use in formulating a robotic task has not been adequately addressed. In this paper, we propose a di… ▽ More After its introduction, impedance control has been utilized as a primary control scheme for robotic manipulation tasks that involve interaction with unknown environments. While impedance control has been extensively studied, the geometric structure of SE(3) for the robotic manipulator itself and its use in formulating a robotic task has not been adequately addressed. In this paper, we propose a differential geometric approach to impedance control. Given a left-invariant error metric in SE(3), the corresponding error vectors in position and velocity are first derived. We then propose the impedance control schemes that adequately account for the geometric structure of the manipulator in SE(3) based on a left-invariant potential function. The closed-loop stabilities for the proposed control schemes are verified using Lyapunov function-based analysis. The proposed control design clearly outperformed a conventional impedance control approach when tracking challenging trajectory profiles. △ Less

Submitted 5 March, 2025; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: Presented at IFAC World Congress 2023, Yokohama, Japan

arXiv:2201.00863 [pdf, other]

Adaptive Model Predictive Control of Wheeled Mobile Robots

Authors: Nikhil Potu Surya Prakash, Tamara Perreault, Trevor Voth, Zejun Zhong

Abstract: In this paper, a control algorithm for guiding a two wheeled mobile robot with unknown inertia to a desired point and orientation using an Adaptive Model Predictive Control (AMPC) framework is presented. The two wheeled mobile robot is modeled as a knife edge or a skate with nonholonomic kinematic constraints and the dynamical equations are derived using the Lagrangian approach. The inputs at ever… ▽ More In this paper, a control algorithm for guiding a two wheeled mobile robot with unknown inertia to a desired point and orientation using an Adaptive Model Predictive Control (AMPC) framework is presented. The two wheeled mobile robot is modeled as a knife edge or a skate with nonholonomic kinematic constraints and the dynamical equations are derived using the Lagrangian approach. The inputs at every time instant are obtained from Model Predictive Control (MPC) with a set of nominal parameters which are updated using a recursive least squares algorithm. The efficacy of the algorithm is demonstrated through numerical simulations at the end of the paper. △ Less

Submitted 3 January, 2022; originally announced January 2022.

Comments: 5 pages, 7 figures

arXiv:2012.06161 [pdf, other]

Conceptualization and Framework of Hybrid Intelligence Systems

Authors: Nikhil Prakash, Kory W. Mathewson

Abstract: As artificial intelligence (AI) systems are getting ubiquitous within our society, issues related to its fairness, accountability, and transparency are increasing rapidly. As a result, researchers are integrating humans with AI systems to build robust and reliable hybrid intelligence systems. However, a proper conceptualization of these systems does not underpin this rapid growth. This article pro… ▽ More As artificial intelligence (AI) systems are getting ubiquitous within our society, issues related to its fairness, accountability, and transparency are increasing rapidly. As a result, researchers are integrating humans with AI systems to build robust and reliable hybrid intelligence systems. However, a proper conceptualization of these systems does not underpin this rapid growth. This article provides a precise definition of hybrid intelligence systems as well as explains its relation with other similar concepts through our proposed framework and examples from contemporary literature. The framework breakdowns the relationship between a human and a machine in terms of the degree of coupling and the directive authority of each party. Finally, we argue that all AI systems are hybrid intelligence systems, so human factors need to be examined at every stage of such systems' lifecycle. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: 8 pages, 1 figure, HAMLETS (Human And Machine in-the-Loop Evaluation and Learning Strategies) workshop at Thirty-fourth Conference on Neural Information Processing Systems

arXiv:2011.08013 [pdf, other]

A General Numerical Method to Model Anisotropy in Discretized Bond-Based Peridynamics

Authors: Naveen Prakash

Abstract: This work proposes a novel, general and robust method of determining bond micromoduli for anisotropic linear elastic bond-based peridynamics. The problem of finding a discrete distribution of bond micromoduli that reproduces an anisotropic peridynamic stiffness tensor is cast as a least-squares problem. The proposed numerical method is able to find a distribution of bond micromoduli that is able t… ▽ More This work proposes a novel, general and robust method of determining bond micromoduli for anisotropic linear elastic bond-based peridynamics. The problem of finding a discrete distribution of bond micromoduli that reproduces an anisotropic peridynamic stiffness tensor is cast as a least-squares problem. The proposed numerical method is able to find a distribution of bond micromoduli that is able to exactly reproduce a desired anisotropic stiffness tensor provided conditions of Cauchy's relations are met. Examples of all eight possible elastic material symmetries, from triclinic to isotropic are given and discussed in depth. Parametric studies are conducted to demonstrate that the numerical method is robust enough to handle a variety of horizon sizes, neighborhood shapes, influence functions and lattice rotation effects. Finally, an example problem is presented to demonstrate that the proposed method is physically sound and that the solution agrees with the analytical solution from classical elasticity. The proposed method has great potential for modeling of deformation and fracture in anisotropic materials with bond-based peridynamics. △ Less

Submitted 28 May, 2021; v1 submitted 16 November, 2020; originally announced November 2020.

Comments: 56 pages

arXiv:1911.00344 [pdf, other]

Short and Wide Network Paths

Authors: Lavanya Marla, Lav R. Varshney, Devavrat Shah, Nirmal A. Prakash, Michael E. Gale

Abstract: Network flow is a powerful mathematical framework to systematically explore the relationship between structure and function in biological, social, and technological networks. We introduce a new pipelining model of flow through networks where commodities must be transported over single paths rather than split over several paths and recombined. We show this notion of pipelined network flow is optimi… ▽ More Network flow is a powerful mathematical framework to systematically explore the relationship between structure and function in biological, social, and technological networks. We introduce a new pipelining model of flow through networks where commodities must be transported over single paths rather than split over several paths and recombined. We show this notion of pipelined network flow is optimized using network paths that are both short and wide, and develop efficient algorithms to compute such paths for given pairs of nodes and for all-pairs. Short and wide paths are characterized for many real-world networks. To further demonstrate the utility of this network characterization, we develop novel information-theoretic lower bounds on computation speed in nervous systems due to limitations from anatomical connectivity and physical noise. For the nematode Caenorhabditis elegans, we find these bounds are predictive of biological timescales of behavior. Further, we find the particular C. elegans connectome is globally less efficient for information flow than random networks, but the hub-and-spoke architecture of functional subcircuits is optimal under constraint on number of synapses. This suggests functional subcircuits are a primary organizational principle of this small invertebrate nervous system. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:1811.07323 [pdf, other]

Nonlinear control of a swinging pendulum on a wheeled mobile robot with nonholonomic constraints

Authors: Nikhil Potu Surya Prakash

Abstract: In this paper, we propose a nonlinear control strategy for swinging up a pendulum to its upright equilibrium position by shaping its swinging energy along with regulating the cart to a desired location. While the base of a usual cart-pole system is restricted to move in a straight line, the present system is allowed to move in the x-y plane with a nonholonomic consraint that its allowable velocity… ▽ More In this paper, we propose a nonlinear control strategy for swinging up a pendulum to its upright equilibrium position by shaping its swinging energy along with regulating the cart to a desired location. While the base of a usual cart-pole system is restricted to move in a straight line, the present system is allowed to move in the x-y plane with a nonholonomic consraint that its allowable velocity is only along its orientation. A simple time invariant control law has been presented and its effectiveness has been demonstrated using numerical experiments. △ Less

Submitted 18 November, 2018; originally announced November 2018.

Comments: 8 pages, 3 figures

arXiv:1805.03727 [pdf, other]

ARES: Adaptive, Reconfigurable, Erasure coded, atomic Storage

Authors: Nicolas Nicolaou, Viveck Cadambe, N. Prakash, Andria Trigeorgi, Kishori M. Konwar, Nancy Lynch, Muriel Medard

Abstract: Atomicity or strong consistency is one of the fundamental, most intuitive, and hardest to provide primitives in distributed shared memory emulations. To ensure survivability, scalability, and availability of a storage service in the presence of failures, traditional approaches for atomic memory emulation, in message passing environments, replicate the objects across multiple servers. Compared to r… ▽ More Atomicity or strong consistency is one of the fundamental, most intuitive, and hardest to provide primitives in distributed shared memory emulations. To ensure survivability, scalability, and availability of a storage service in the presence of failures, traditional approaches for atomic memory emulation, in message passing environments, replicate the objects across multiple servers. Compared to replication based algorithms, erasure code-based atomic memory algorithms has much lower storage and communication costs, but usually, they are harder to design. The difficulty of designing atomic memory algorithms further grows, when the set of servers may be changed to ensure survivability of the service over software and hardware upgrades, while avoiding service interruptions. Atomic memory algorithms for performing server reconfiguration, in the replicated systems, are very few, complex, and are still part of an active area of research; reconfigurations of erasure-code based algorithms are non-existent. In this work, we present ARES, an algorithmic framework that allows reconfiguration of the underlying servers, and is particularly suitable for erasure-code based algorithms emulating atomic objects. ARES introduces new configurations while keeping the service available. To use with ARES we also propose a new, and to our knowledge, the first two-round erasure code based algorithm TREAS, for emulating multi-writer, multi-reader (MWMR) atomic objects in asynchronous, message-passing environments, with near-optimal communication and storage costs. Our algorithms can tolerate crash failures of any client and some fraction of servers, and yet, guarantee safety and liveness property. Moreover, by bringing together the advantages of ARES and TREAS, we propose an optimized algorithm where new configurations can be installed without the objects values passing through the reconfiguration clients. △ Less

Submitted 28 May, 2021; v1 submitted 9 May, 2018; originally announced May 2018.

arXiv:1805.00396 [pdf, other]

Updating Content in Cache-Aided Coded Multicast

Authors: Milad Mahdian, N. Prakash, Muriel Médard, Edmund Yeh

Abstract: Motivated by applications to delivery of dynamically updated, but correlated data in settings such as content distribution networks, and distributed file sharing systems, we study a single source multiple destination network coded multicast problem in a cache-aided network. We focus on models where the caches are primarily located near the destinations, and where the source has no cache. The sourc… ▽ More Motivated by applications to delivery of dynamically updated, but correlated data in settings such as content distribution networks, and distributed file sharing systems, we study a single source multiple destination network coded multicast problem in a cache-aided network. We focus on models where the caches are primarily located near the destinations, and where the source has no cache. The source observes a sequence of correlated frames, and is expected to do frame-by-frame encoding with no access to prior frames. We present a novel scheme that shows how the caches can be advantageously used to decrease the overall cost of multicast, even though the source encodes without access to past data. Our cache design and update scheme works with any choice of network code designed for a corresponding cache-less network, is largely decentralized, and works for an arbitrary network. We study a convex relation of the optimization problem that results form the overall cost function. The results of the optimization problem determines the rate allocation and caching strategies. Numerous simulation results are presented to substantiate the theory developed. △ Less

Submitted 1 May, 2018; originally announced May 2018.

Comments: To Appear in IEEE Journal on Selected Areas in Communications: Special Issue on Caching for Communication Systems and Networks

arXiv:1708.05474 [pdf, other]

The Storage vs Repair Bandwidth Trade-off for Multiple Failures in Clustered Storage Networks

Authors: Vitaly Abdrashitov, N. Prakash, Muriel Médard

Abstract: We study the trade-off between storage overhead and inter-cluster repair bandwidth in clustered storage systems, while recovering from multiple node failures within a cluster. A cluster is a collection of $m$ nodes, and there are $n$ clusters. For data collection, we download the entire content from any $k$ clusters. For repair of $t \geq 2$ nodes within a cluster, we take help from $\ell$ local n… ▽ More We study the trade-off between storage overhead and inter-cluster repair bandwidth in clustered storage systems, while recovering from multiple node failures within a cluster. A cluster is a collection of $m$ nodes, and there are $n$ clusters. For data collection, we download the entire content from any $k$ clusters. For repair of $t \geq 2$ nodes within a cluster, we take help from $\ell$ local nodes, as well as $d$ helper clusters. We characterize the optimal trade-off under functional repair, and also under exact repair for the minimum storage and minimum inter-cluster bandwidth (MBR) operating points. Our bounds show the following interesting facts: $1)$ When $t|(m-\ell)$ the trade-off is the same as that under $t=1$, and thus there is no advantage in jointly repairing multiple nodes, $2)$ When $t \nmid (m-\ell)$, the optimal file-size at the MBR point under exact repair can be strictly less than that under functional repair. $3)$ Unlike the case of $t=1$, increasing the number of local helper nodes does not necessarily increase the system capacity under functional repair. △ Less

Submitted 17 August, 2017; originally announced August 2017.

Comments: Accepted to IEEE Information Theory Workshop(ITW) 2017

arXiv:1703.01286 [pdf, other]

A Layered Architecture for Erasure-Coded Consistent Distributed Storage

Authors: Kishori M. Konwar, N. Prakash, Nancy Lynch, Muriel Medard

Abstract: Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact with an edge-layer of servers that is geographically near; the edge-layer in turn interacts with a back-end layer of servers. The edge-layer provides low latency… ▽ More Motivated by emerging applications to the edge computing paradigm, we introduce a two-layer erasure-coded fault-tolerant distributed storage system offering atomic access for read and write operations. In edge computing, clients interact with an edge-layer of servers that is geographically near; the edge-layer in turn interacts with a back-end layer of servers. The edge-layer provides low latency access and temporary storage for client operations, and uses the back-end layer for persistent storage. Our algorithm, termed Layered Data Storage (LDS) algorithm, offers several features suitable for edge-computing systems, works under asynchronous message-passing environments, supports multiple readers and writers, and can tolerate $f_1 < n_1/2$ and $f_2 < n_2/3$ crash failures in the two layers having $n_1$ and $n_2$ servers, respectively. We use a class of erasure codes known as regenerating codes for storage of data in the back-end layer. The choice of regenerating codes, instead of popular choices like Reed-Solomon codes, not only optimizes the cost of back-end storage, but also helps in optimizing communication cost of read operations, when the value needs to be recreated all the way from the back-end. The two-layer architecture permits a modular implementation of atomicity and erasure-code protocols; the implementation of erasure-codes is mostly limited to interaction between the two layers. We prove liveness and atomicity of LDS, and also compute performance costs associated with read and write operations. Further, in a multi-object system running $N$ independent instances of LDS, where only a small fraction of the objects undergo concurrent accesses at any point during the execution, the overall storage cost is dominated by that of persistent storage in the back-end layer, and is given by $Θ(N)$. △ Less

Submitted 30 May, 2017; v1 submitted 3 March, 2017; originally announced March 2017.

Comments: To appear in ACM PODC 2017

arXiv:1701.04909 [pdf, other]

doi 10.1109/TIT.2018.2806342

The Storage vs Repair-Bandwidth Trade-off for Clustered Storage Systems

Authors: N. Prakash, Vitaly Abdrashitov, Muriel Medard

Abstract: We study a generalization of the setting of regenerating codes, motivated by applications to storage systems consisting of clusters of storage nodes. There are $n$ clusters in total, with $m$ nodes per cluster. A data file is coded and stored across the $mn$ nodes, with each node storing $α$ symbols. For availability of data, we require that the file be retrievable by downloading the entire conten… ▽ More We study a generalization of the setting of regenerating codes, motivated by applications to storage systems consisting of clusters of storage nodes. There are $n$ clusters in total, with $m$ nodes per cluster. A data file is coded and stored across the $mn$ nodes, with each node storing $α$ symbols. For availability of data, we require that the file be retrievable by downloading the entire content from any subset of $k$ clusters. Nodes represent entities that can fail. We distinguish between intra-cluster and inter-cluster bandwidth (BW) costs during node repair. Node-repair in a cluster is accomplished by downloading $β$ symbols each from any set of $d$ other clusters, dubbed remote helper clusters, and also up to $α$ symbols each from any set of $\ell$ surviving nodes, dubbed local helper nodes, in the host cluster. We first identify the optimal trade-off between storage-overhead and inter-cluster repair-bandwidth under functional repair, and also present optimal exact-repair code constructions for a class of parameters. The new trade-off is strictly better than what is achievable via space-sharing existing coding solutions, whenever $\ell > 0$. We then obtain sharp lower bounds on the necessary intra-cluster repair BW to achieve optimal trade-off. Our bounds reveal the interesting fact that, while it is beneficial to increase the number of local helper nodes $\ell$ in order to improve the storage-vs-inter-cluster-repair-BW trade-off, increasing $\ell$ not only increases intra-cluster BW in the host-cluster, but also increases the intra-cluster BW in the remote helper clusters. We also analyze resilience of the clustered storage system against passive eavesdropping by providing file-size bounds and optimal code constructions. △ Less

Submitted 1 February, 2018; v1 submitted 17 January, 2017; originally announced January 2017.

Comments: Accepted for publication in IEEE Transactions on Information Theory

Journal ref: IEEE Transactions on Information Theory ( Volume: 64, Issue: 8, Aug. 2018 )

arXiv:1606.04467 [pdf, other]

Outer Bounds on the Storage-Repair Bandwidth Tradeoff of Exact-Repair Regenerating Codes

Authors: Birenjith Sasidharan, N. Prakash, M. Nikhil Krishnan, Myna Vajha, Kaushik Senthoor, P. Vijay Kumar

Abstract: In this paper, three outer bounds on the normalized storage-repair bandwidth (S-RB) tradeoff of regenerating codes having parameter set $\{(n,k,d),(α,β)\}$ under the exact-repair (ER) setting are presented. The first outer bound is applicable for every parameter set $(n,k,d)$ and in conjunction with a code construction known as {\em improved layered codes}, it characterizes the normalized ER trade… ▽ More In this paper, three outer bounds on the normalized storage-repair bandwidth (S-RB) tradeoff of regenerating codes having parameter set $\{(n,k,d),(α,β)\}$ under the exact-repair (ER) setting are presented. The first outer bound is applicable for every parameter set $(n,k,d)$ and in conjunction with a code construction known as {\em improved layered codes}, it characterizes the normalized ER tradeoff for the case $(n,k=3,d=n-1)$. It establishes a non-vanishing gap between the ER and functional-repair (FR) tradeoffs for every $(n,k,d)$. The second bound is an improvement upon an existing bound due to Mohajer et al. and is tighter than the first bound, in a regime away from the Minimum Storage Regeneraing (MSR) point. The third bound is for the case of $k=d$, under the linear setting. This outer bound matches with the achievable region of {\em layered codes} thereby characterizing the normalized ER tradeoff of linear ER codes when $k=d=n-1$. △ Less

Submitted 14 June, 2016; originally announced June 2016.

Comments: Accepted for publication at International Journal of Information and Coding Theory (Special Issue on Information and Coding Theory for Data Storage)

arXiv:1605.05717 [pdf, ps, other]

RADON: Repairable Atomic Data Object in Networks

Authors: Kishori M. Konwar, N. Prakash, Nancy Lynch, Muriel Medard

Abstract: Erasure codes offer an efficient way to decrease storage and communication costs while implementing atomic memory service in asynchronous distributed storage systems. In this paper, we provide erasure-code-based algorithms having the additional ability to perform background repair of crashed nodes. A repair operation of a node in the crashed state is triggered externally, and is carried out by the… ▽ More Erasure codes offer an efficient way to decrease storage and communication costs while implementing atomic memory service in asynchronous distributed storage systems. In this paper, we provide erasure-code-based algorithms having the additional ability to perform background repair of crashed nodes. A repair operation of a node in the crashed state is triggered externally, and is carried out by the concerned node via message exchanges with other active nodes in the system. Upon completion of repair, the node re-enters active state, and resumes participation in ongoing and future read, write, and repair operations. To guarantee liveness and atomicity simultaneously, existing works assume either the presence of nodes with stable storage, or presence of nodes that never crash during the execution. We demand neither of these; instead we consider a natural, yet practical network stability condition $N1$ that only restricts the number of nodes in the crashed/repair state during broadcast of any message. We present an erasure-code based algorithm $RADON_C$ that is always live, and guarantees atomicity as long as condition $N1$ holds. In situations when the number of concurrent writes is limited, $RADON_C$ has significantly improved storage and communication cost over a replication-based algorithm $RADON_R$, which also works under $N1$. We further show how a slightly stronger network stability condition $N2$ can be used to construct algorithms that never violate atomicity. The guarantee of atomicity comes at the expense of having an additional phase during the read and write operations. △ Less

Submitted 21 November, 2016; v1 submitted 18 May, 2016; originally announced May 2016.

Comments: To be presented at OPODIS 2016

arXiv:1605.01748 [pdf, ps, other]

doi 10.1109/IPDPS.2016.55

Storage-Optimized Data-Atomic Algorithms for Handling Erasures and Errors in Distributed Storage Systems

Authors: Kishori M. Konwar, N. Prakash, Erez Kantor, Nancy Lynch, Muriel Medard, Alexander A. Schwarzmann

Abstract: Erasure codes are increasingly being studied in the context of implementing atomic memory objects in large scale asynchronous distributed storage systems. When compared with the traditional replication based schemes, erasure codes have the potential of significantly lowering storage and communication costs while simultaneously guaranteeing the desired resiliency levels. In this work, we propose th… ▽ More Erasure codes are increasingly being studied in the context of implementing atomic memory objects in large scale asynchronous distributed storage systems. When compared with the traditional replication based schemes, erasure codes have the potential of significantly lowering storage and communication costs while simultaneously guaranteeing the desired resiliency levels. In this work, we propose the Storage-Optimized Data-Atomic (SODA) algorithm for implementing atomic memory objects in the multi-writer multi-reader setting. SODA uses Maximum Distance Separable (MDS) codes, and is specifically designed to optimize the total storage cost for a given fault-tolerance requirement. For tolerating $f$ server crashes in an $n$-server system, SODA uses an $[n, k]$ MDS code with $k=n-f$, and incurs a total storage cost of $\frac{n}{n-f}$. SODA is designed under the assumption of reliable point-to-point communication channels. The communication cost of a write and a read operation are respectively given by $O(f^2)$ and $\frac{n}{n-f}(δ_w+1)$, where $δ_w$ denotes the number of writes that are concurrent with the particular read. In comparison with the recent CASGC algorithm, which also uses MDS codes, SODA offers lower storage cost while pays more on the communication cost. We also present a modification of SODA, called SODA$_{\text{err}}$, to handle the case where some of the servers can return erroneous coded elements during a read operation. Specifically, in order to tolerate $f$ server failures and $e$ error-prone coded elements, the SODA$_{\text{err}}$ algorithm uses an $[n, k]$ MDS code such that $k=n-2e-f$. SODA$_{\text{err}}$ also guarantees liveness and atomicity, while maintaining an optimized total storage cost of $\frac{n}{n-f-2e}$. △ Less

Submitted 5 May, 2016; originally announced May 2016.

Comments: Accepted for Publication at IEEE IPDPS, 2016

arXiv:1605.01105 [pdf, other]

Communication Cost for Updating Linear Functions when Message Updates are Sparse: Connections to Maximally Recoverable Codes

Authors: N. Prakash, Muriel Medard

Abstract: We consider a communication problem in which an update of the source message needs to be conveyed to one or more distant receivers that are interested in maintaining specific linear functions of the source message. The setting is one in which the updates are sparse in nature, and where neither the source nor the receiver(s) is aware of the exact {\em difference vector}, but only know the amount of… ▽ More We consider a communication problem in which an update of the source message needs to be conveyed to one or more distant receivers that are interested in maintaining specific linear functions of the source message. The setting is one in which the updates are sparse in nature, and where neither the source nor the receiver(s) is aware of the exact {\em difference vector}, but only know the amount of sparsity that is present in the difference-vector. Under this setting, we are interested in devising linear encoding and decoding schemes that minimize the communication cost involved. We show that the optimal solution to this problem is closely related to the notion of maximally recoverable codes (MRCs), which were originally introduced in the context of coding for storage systems. In the context of storage, MRCs guarantee optimal erasure protection when the system is partially constrained to have local parity relations among the storage nodes. In our problem, we show that optimal solutions exist if and only if MRCs of certain kind (identified by the desired linear functions) exist. We consider point-to-point and broadcast versions of the problem, and identify connections to MRCs under both these settings. For the point-to-point setting, we show that our linear-encoder based achievable scheme is optimal even when non-linear encoding is permitted. The theory is illustrated in the context of updating erasure coded storage nodes. We present examples based on modern storage codes such as the minimum bandwidth regenerating codes. △ Less

Submitted 5 August, 2018; v1 submitted 3 May, 2016; originally announced May 2016.

Comments: To Appear in IEEE Transactions on Information Theory

arXiv:1501.03983 [pdf, other]

The Storage-Repair-Bandwidth Trade-off of Exact Repair Linear Regenerating Codes for the Case $d = k = n-1$

Authors: N. Prakash, M. Nikhil Krishnan

Abstract: In this paper, we consider the setting of exact repair linear regenerating codes. Under this setting, we derive a new outer bound on the storage-repair-bandwidth trade-off for the case when $d = k = n -1$, where $(n, k, d)$ are parameters of the regenerating code, with their usual meaning. Taken together with the achievability result of Tian et. al. [1], we show that the new outer bound derived he… ▽ More In this paper, we consider the setting of exact repair linear regenerating codes. Under this setting, we derive a new outer bound on the storage-repair-bandwidth trade-off for the case when $d = k = n -1$, where $(n, k, d)$ are parameters of the regenerating code, with their usual meaning. Taken together with the achievability result of Tian et. al. [1], we show that the new outer bound derived here completely characterizes the trade-off for the case of exact repair linear regenerating codes, when $d = k = n -1$. The new outer bound is derived by analyzing the dual code of the linear regenerating code. △ Less

Submitted 26 January, 2015; v1 submitted 16 January, 2015; originally announced January 2015.

Comments: Corrected typos, minor editing for better readability

arXiv:1406.6783 [pdf, other]

Evaluation of Codes with Inherent Double Replication for Hadoop

Authors: M. Nikhil Krishnan, N. Prakash, V. Lalitha, Birenjith Sasidharan, P. Vijay Kumar, Srinivasan Narayanamurthy, Ranjit Kumar, Siddhartha Nandi

Abstract: In this paper, we evaluate the efficacy, in a Hadoop setting, of two coding schemes, both possessing an inherent double replication of data. The two coding schemes belong to the class of regenerating and locally regenerating codes respectively, and these two classes are representative of recent advances made in designing codes for the efficient storage of data in a distributed setting. In comparis… ▽ More In this paper, we evaluate the efficacy, in a Hadoop setting, of two coding schemes, both possessing an inherent double replication of data. The two coding schemes belong to the class of regenerating and locally regenerating codes respectively, and these two classes are representative of recent advances made in designing codes for the efficient storage of data in a distributed setting. In comparison with triple replication, double replication permits a significant reduction in storage overhead, while delivering good MapReduce performance under moderate work loads. The two coding solutions under evaluation here, add only moderately to the storage overhead of double replication, while simultaneously offering reliability levels similar to that of triple replication. One might expect from the property of inherent data duplication that the performance of these codes in executing a MapReduce job would be comparable to that of double replication. However, a second feature of this class of code comes into play here, namely that under both coding schemes analyzed here, multiple blocks from the same coded stripe are required to be stored on the same node. This concentration of data belonging to a single stripe negatively impacts MapReduce execution times. However, much of this effect can be undone by simply adding a larger number of processors per node. Further improvements are possible if one tailors the Map task scheduler to the codes under consideration. We present both experimental and simulation results that validate these observations. △ Less

Submitted 26 June, 2014; originally announced June 2014.

Comments: in Proceedings of Usenix HotStorage, Philadelphia, PA, June 2014

arXiv:1401.2422 [pdf, other]

Codes with Locality for Two Erasures

Authors: N. Prakash, V. Lalitha, P. Vijay Kumar

Abstract: In this paper, we study codes with locality that can recover from two erasures via a sequence of two local, parity-check computations. By a local parity-check computation, we mean recovery via a single parity-check equation associated to small Hamming weight. Earlier approaches considered recovery in parallel; the sequential approach allows us to potentially construct codes with improved minimum d… ▽ More In this paper, we study codes with locality that can recover from two erasures via a sequence of two local, parity-check computations. By a local parity-check computation, we mean recovery via a single parity-check equation associated to small Hamming weight. Earlier approaches considered recovery in parallel; the sequential approach allows us to potentially construct codes with improved minimum distance. These codes, which we refer to as locally 2-reconstructible codes, are a natural generalization along one direction, of codes with all-symbol locality introduced by Gopalan \textit{et al}, in which recovery from a single erasure is considered. By studying the Generalized Hamming Weights of the dual code, we derive upper bounds on the minimum distance of locally 2-reconstructible codes and provide constructions for a family of codes based on Turán graphs, that are optimal with respect to this bound. The minimum distance bound derived here is universal in the sense that no code which permits all-symbol local recovery from $2$ erasures can have larger minimum distance regardless of approach adopted. Our approach also leads to a new bound on the minimum distance of codes with all-symbol locality for the single-erasure case. △ Less

Submitted 27 January, 2014; v1 submitted 10 January, 2014; originally announced January 2014.

Comments: 14 pages, 3 figures, Updated for improved readability

arXiv:1302.5021 [pdf, ps, other]

doi 10.1109/JSAC.2013.130406

Linear Coding Schemes for the Distributed Computation of Subspaces

Authors: V. Lalitha, N. Prakash, K. Vinodh, P. Vijay Kumar, S. Sandeep Pradhan

Abstract: Let $X_1, ..., X_m$ be a set of $m$ statistically dependent sources over the common alphabet $\mathbb{F}_q$, that are linearly independent when considered as functions over the sample space. We consider a distributed function computation setting in which the receiver is interested in the lossless computation of the elements of an $s$-dimensional subspace $W$ spanned by the elements of the row vect… ▽ More Let $X_1, ..., X_m$ be a set of $m$ statistically dependent sources over the common alphabet $\mathbb{F}_q$, that are linearly independent when considered as functions over the sample space. We consider a distributed function computation setting in which the receiver is interested in the lossless computation of the elements of an $s$-dimensional subspace $W$ spanned by the elements of the row vector $[X_1, \ldots, X_m]Γ$ in which the $(m \times s)$ matrix $Γ$ has rank $s$. A sequence of three increasingly refined approaches is presented, all based on linear encoders. The first approach uses a common matrix to encode all the sources and a Korner-Marton like receiver to directly compute $W$. The second improves upon the first by showing that it is often more efficient to compute a carefully chosen superspace $U$ of $W$. The superspace is identified by showing that the joint distribution of the $\{X_i\}$ induces a unique decomposition of the set of all linear combinations of the $\{X_i\}$, into a chain of subspaces identified by a normalized measure of entropy. This subspace chain also suggests a third approach, one that employs nested codes. For any joint distribution of the $\{X_i\}$ and any $W$, the sum-rate of the nested code approach is no larger than that under the Slepian-Wolf (SW) approach. Under the SW approach, $W$ is computed by first recovering each of the $\{X_i\}$. For a large class of joint distributions and subspaces $W$, the nested code approach is shown to improve upon SW. Additionally, a class of source distributions and subspaces are identified, for which the nested-code approach is sum-rate optimal. △ Less

Submitted 20 February, 2013; originally announced February 2013.

Comments: To appear in IEEE Journal of Selected Areas in Communications (In-Network Computation: Exploring the Fundamental Limits), April 2013

arXiv:1302.0744 [pdf, other]

Explicit MBR All-Symbol Locality Codes

Authors: Govinda M. Kamath, Natalia Silberstein, N. Prakash, Ankit S. Rawat, V. Lalitha, O. Ozan Koyluoglu, P. Vijay Kumar, Sriram Vishwanath

Abstract: Node failures are inevitable in distributed storage systems (DSS). To enable efficient repair when faced with such failures, two main techniques are known: Regenerating codes, i.e., codes that minimize the total repair bandwidth; and codes with locality, which minimize the number of nodes participating in the repair process. This paper focuses on regenerating codes with locality, using pre-coding… ▽ More Node failures are inevitable in distributed storage systems (DSS). To enable efficient repair when faced with such failures, two main techniques are known: Regenerating codes, i.e., codes that minimize the total repair bandwidth; and codes with locality, which minimize the number of nodes participating in the repair process. This paper focuses on regenerating codes with locality, using pre-coding based on Gabidulin codes, and presents constructions that utilize minimum bandwidth regenerating (MBR) local codes. The constructions achieve maximum resilience (i.e., optimal minimum distance) and have maximum capacity (i.e., maximum rate). Finally, the same pre-coding mechanism can be combined with a subclass of fractional-repetition codes to enable maximum resilience and repair-by-transfer simultaneously. △ Less

Submitted 27 May, 2013; v1 submitted 4 February, 2013; originally announced February 2013.

arXiv:1211.1932 [pdf, other]

Codes with Local Regeneration

Authors: Govinda M. Kamath, N. Prakash, V. Lalitha, P. Vijay Kumar

Abstract: Regenerating codes and codes with locality are two schemes that have recently been proposed to ensure data collection and reliability in a distributed storage network. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. In… ▽ More Regenerating codes and codes with locality are two schemes that have recently been proposed to ensure data collection and reliability in a distributed storage network. In a situation where one is attempting to repair a failed node, regenerating codes seek to minimize the amount of data downloaded for node repair, while codes with locality attempt to minimize the number of helper nodes accessed. In this paper, we provide several constructions for a class of vector codes with locality in which the local codes are regenerating codes, that enjoy both advantages. We derive an upper bound on the minimum distance of this class of codes and show that the proposed constructions achieve this bound. The constructions include both the cases where the local regenerating codes correspond to the MSR as well as the MBR point on the storage-repair-bandwidth tradeoff curve of regenerating codes. Also included is a performance comparison of various code constructions for fixed block length and minimum distance. △ Less

Submitted 4 February, 2013; v1 submitted 8 November, 2012; originally announced November 2012.

Comments: 44 pages, 7 figures. A class of codes termed as Uniform Rank Accumulation (URA) codes is introduced and a minimum distance bound is derived when the local codes are URA codes. Also, the results of our earlier arXiv submssion(arXiv:1202:2414[cs.IT]) are included in Section 3 of this version

arXiv:1202.2414 [pdf, ps, other]

Optimal Linear Codes with a Local-Error-Correction Property

Authors: N. Prakash, Govinda M. Kamath, V. Lalitha, P. Vijay Kumar

Abstract: Motivated by applications to distributed storage, Gopalan \textit{et al} recently introduced the interesting notion of information-symbol locality in a linear code. By this it is meant that each message symbol appears in a parity-check equation associated with small Hamming weight, thereby enabling recovery of the message symbol by examining a small number of other code symbols. This notion is exp… ▽ More Motivated by applications to distributed storage, Gopalan \textit{et al} recently introduced the interesting notion of information-symbol locality in a linear code. By this it is meant that each message symbol appears in a parity-check equation associated with small Hamming weight, thereby enabling recovery of the message symbol by examining a small number of other code symbols. This notion is expanded to the case when all code symbols, not just the message symbols, are covered by such "local" parity. In this paper, we extend the results of Gopalan et. al. so as to permit recovery of an erased code symbol even in the presence of errors in local parity symbols. We present tight bounds on the minimum distance of such codes and exhibit codes that are optimal with respect to the local error-correction property. As a corollary, we obtain an upper bound on the minimum distance of a concatenated code. △ Less

Submitted 11 February, 2012; originally announced February 2012.

Comments: 13 pages, Shorter version submitted to ISIT 2012

Showing 1–43 of 43 results for author: Prakash, N