Search | arXiv e-print repository

Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

Authors: Maya Srikanth, Run Chen, Julia Hirschberg

Abstract: Multimodal models play a key role in empathy detection, but their performance can suffer when modalities provide conflicting cues. To understand these failures, we examine cases where unimodal and multimodal predictions diverge. Using fine-tuned models for text, audio, and video, along with a gated fusion model, we find that such disagreements often reflect underlying ambiguity, as evidenced by an… ▽ More Multimodal models play a key role in empathy detection, but their performance can suffer when modalities provide conflicting cues. To understand these failures, we examine cases where unimodal and multimodal predictions diverge. Using fine-tuned models for text, audio, and video, along with a gated fusion model, we find that such disagreements often reflect underlying ambiguity, as evidenced by annotator uncertainty. Our analysis shows that dominant signals in one modality can mislead fusion when unsupported by others. We also observe that humans, like models, do not consistently benefit from multimodal input. These insights position disagreement as a useful diagnostic signal for identifying challenging examples and improving empathy system robustness. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2409.00856 [pdf, other]

Benchmarking LLM Code Generation for Audio Programming with Visual Dataflow Languages

Authors: William Zhang, Maria Leon, Ryan Xu, Adrian Cardenas, Amelia Wissink, Hanna Martin, Maya Srikanth, Kaya Dorogi, Christian Valadez, Pedro Perez, Citlalli Grijalva, Corey Zhang, Mark Santolucito

Abstract: Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive programming background. Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity. However, the best strategy for… ▽ More Node-based programming languages are increasingly popular in media arts coding domains. These languages are designed to be accessible to users with limited coding experience, allowing them to achieve creative output without an extensive programming background. Using LLM-based code generation to further lower the barrier to creative output is an exciting opportunity. However, the best strategy for code generation for visual node-based programming languages is still an open question. In particular, such languages have multiple levels of representation in text, each of which may be used for code generation. In this work, we explore the performance of LLM code generation in audio programming tasks in visual programming languages at multiple levels of representation. We explore code generation through metaprogramming code representations for these languages (i.e., coding the language using a different high-level text-based programming language), as well as through direct node generation with JSON. We evaluate code generated in this way for two visual languages for audio programming on a benchmark set of coding problems. We measure both correctness and complexity of the generated code. We find that metaprogramming results in more semantically correct generated code, given that the code is well-formed (i.e., is syntactically correct and runs). We also find that prompting for richer metaprogramming using randomness and loops led to more complex code. △ Less

Submitted 1 September, 2024; originally announced September 2024.

arXiv:2405.13019 [pdf, other]

A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

Authors: Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha

Abstract: Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey… ▽ More Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey of accelerated generation techniques in autoregressive language models, aiming to understand the state-of-the-art methods and their applications. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods. We discuss each category's underlying principles, advantages, limitations, and recent advancements. Through this survey, we aim to offer insights into the current landscape of techniques in LLMs and provide guidance for future research directions in this critical area of natural language processing. △ Less

Submitted 24 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2404.19422 [pdf, other]

Efficient Algorithms for Earliest and Fastest Paths in Public Transport Networks

Authors: Mithinti Srikanth, G. Ramakrishna

Abstract: Public transport administrators rely on efficient algorithms for various problems that arise in public transport networks. In particular, our study focused on designing linear-time algorithms for two fundamental path problems: the earliest arrival time (\textsc{eat}) and the fastest path duration (\textsc{fpd}) on public transportation data. We conduct a comparative analysis with state-of-the-art… ▽ More Public transport administrators rely on efficient algorithms for various problems that arise in public transport networks. In particular, our study focused on designing linear-time algorithms for two fundamental path problems: the earliest arrival time (\textsc{eat}) and the fastest path duration (\textsc{fpd}) on public transportation data. We conduct a comparative analysis with state-of-the-art algorithms. The results are quite promising, indicating substantial efficiency improvements. Specifically, the fastest path problem shows a remarkable 34-fold speedup, while the earliest arrival time problem exhibits an even more impressive 183-fold speedup. These findings highlight the effectiveness of our algorithms to solve \textsc{eat} and \textsc{fpd} problems in public transport, and eventually help public administrators to enrich the urban transport experience. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2312.02200 [pdf, other]

An Empirical Study of Automated Mislabel Detection in Real World Vision Datasets

Authors: Maya Srikanth, Jeremy Irvin, Brian Wesley Hill, Felipe Godoy, Ishan Sabane, Andrew Y. Ng

Abstract: Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved d… ▽ More Major advancements in computer vision can primarily be attributed to the use of labeled datasets. However, acquiring labels for datasets often results in errors which can harm model performance. Recent works have proposed methods to automatically identify mislabeled images, but developing strategies to effectively implement them in real world datasets has been sparsely explored. Towards improved data-centric methods for cleaning real world vision datasets, we first conduct more than 200 experiments carefully benchmarking recently developed automated mislabel detection methods on multiple datasets under a variety of synthetic and real noise settings with varying noise levels. We compare these methods to a Simple and Efficient Mislabel Detector (SEMD) that we craft, and find that SEMD performs similarly to or outperforms prior mislabel detection approaches. We then apply SEMD to multiple real world computer vision datasets and test how dataset size, mislabel removal strategy, and mislabel removal amount further affect model performance after retraining on the cleaned data. With careful design of the approach, we find that mislabel removal leads per-class performance improvements of up to 8% of a retrained classifier in smaller data regimes. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2105.06603 [pdf, other]

Adversarial Learning for Zero-Shot Stance Detection on Social Media

Authors: Emily Allaway, Malavika Srikanth, Kathleen McKeown

Abstract: Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-sho… ▽ More Stance detection on social media can help to identify and understand slanted news or commentary in everyday life. In this work, we propose a new model for zero-shot stance detection on Twitter that uses adversarial learning to generalize across topics. Our model achieves state-of-the-art performance on a number of unseen test topics with minimal computational costs. In addition, we extend zero-shot stance detection to new topics, highlighting future directions for zero-shot transfer. △ Less

Submitted 13 May, 2021; originally announced May 2021.

Comments: To appear in NAACL 2021

arXiv:2102.12596 [pdf, other]

Dynamic Social Media Monitoring for Fast-Evolving Online Discussions

Authors: Maya Srikanth, Anqi Liu, Nicholas Adams-Cohen, Jian Cao, R. Michael Alvarez, Anima Anandkumar

Abstract: Tracking and collecting fast-evolving online discussions provides vast data for studying social media usage and its role in people's public lives. However, collecting social media data using a static set of keywords fails to satisfy the growing need to monitor dynamic conversations and to study fast-changing topics. We propose a dynamic keyword search method to maximize the coverage of relevant in… ▽ More Tracking and collecting fast-evolving online discussions provides vast data for studying social media usage and its role in people's public lives. However, collecting social media data using a static set of keywords fails to satisfy the growing need to monitor dynamic conversations and to study fast-changing topics. We propose a dynamic keyword search method to maximize the coverage of relevant information in fast-evolving online discussions. The method uses word embedding models to represent the semantic relations between keywords and predictive models to forecast the future time series. We also implement a visual user interface to aid in the decision-making process in each round of keyword updates. This allows for both human-assisted tracking and fully-automated data collection. In simulations using historical #MeToo data in 2017, our human-assisted tracking method outperforms the traditional static baseline method significantly, with 37.1% higher F-1 score than traditional static monitors in tracking the top trending keywords. We conduct a contemporary case study to cover dynamic conversations about the recent Presidential Inauguration and to test the dynamic data collection system. Our case studies reflect the effectiveness of our process and also points to the potential challenges in future deployment. △ Less

Submitted 24 February, 2021; originally announced February 2021.

Comments: Preprint, Under Review

arXiv:2007.06781 [pdf, other]

Vehicle Trajectory Prediction by Transfer Learning of Semi-Supervised Models

Authors: Nick Lamm, Shashank Jaiprakash, Malavika Srikanth, Iddo Drori

Abstract: In this work we show that semi-supervised models for vehicle trajectory prediction significantly improve performance over supervised models on state-of-the-art real-world benchmarks. Moving from supervised to semi-supervised models allows scaling-up by using unlabeled data, increasing the number of images in pre-training from Millions to a Billion. We perform ablation studies comparing transfer le… ▽ More In this work we show that semi-supervised models for vehicle trajectory prediction significantly improve performance over supervised models on state-of-the-art real-world benchmarks. Moving from supervised to semi-supervised models allows scaling-up by using unlabeled data, increasing the number of images in pre-training from Millions to a Billion. We perform ablation studies comparing transfer learning of semi-supervised and supervised models while keeping all other factors equal. Within semi-supervised models we compare contrastive learning with teacher-student methods as well as networks predicting a small number of trajectories with networks predicting probabilities over a large trajectory set. Our results using both low-level and mid-level representations of the driving environment demonstrate the applicability of semi-supervised methods for real-world vehicle trajectory prediction. △ Less

Submitted 9 October, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

arXiv:2002.09536 [pdf, other]

Image to Language Understanding: Captioning approach

Authors: Madhavan Seshadri, Malavika Srikanth, Mikhail Belov

Abstract: Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc. Such an approach is a combination of Computer Vision and Natural Language techniques which is a hard problem to solve. This project aims to compare different a… ▽ More Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc. Such an approach is a combination of Computer Vision and Natural Language techniques which is a hard problem to solve. This project aims to compare different approaches for solving the image captioning problem. In specific, the focus was on comparing two different types of models: Encoder-Decoder approach and a Multi-model approach. In the encoder-decoder approach, inject and merge architectures were compared against a multi-modal image captioning approach based primarily on object detection. These approaches have been compared on the basis on state of the art sentence comparison metrics such as BLEU, GLEU, Meteor, and Rouge on a subset of the Google Conceptual captions dataset which contains 100k images. On the basis of this comparison, we observed that the best model was the Inception injected encoder model. This best approach has been deployed as a web-based system. On uploading an image, such a system will output the best caption associated with the image. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: 8 pages

arXiv:1911.05332 [pdf, other]

Finding Social Media Trolls: Dynamic Keyword Selection Methods for Rapidly-Evolving Online Debates

Authors: Anqi Liu, Maya Srikanth, Nicholas Adams-Cohen, R. Michael Alvarez, Anima Anandkumar

Abstract: Online harassment is a significant social problem. Prevention of online harassment requires rapid detection of harassing, offensive, and negative social media posts. In this paper, we propose the use of word embedding models to identify offensive and harassing social media messages in two aspects: detecting fast-changing topics for more effective data collection and representing word semantics in… ▽ More Online harassment is a significant social problem. Prevention of online harassment requires rapid detection of harassing, offensive, and negative social media posts. In this paper, we propose the use of word embedding models to identify offensive and harassing social media messages in two aspects: detecting fast-changing topics for more effective data collection and representing word semantics in different domains. We demonstrate with preliminary results that using the GloVe (Global Vectors for Word Representation) model facilitates the discovery of new and relevant keywords to use for data collection and trolling detection. Our paper concludes with a discussion of a research agenda to further develop and test word embedding models for identification of social media harassment and trolling. △ Less

Submitted 15 November, 2019; v1 submitted 13 November, 2019; originally announced November 2019.

Comments: AI for Social Good workshop at NeurIPS (2019)

arXiv:1304.3554 [pdf]

Global cognitive radio based communication systems: Space-time communications

Authors: Dr. G. Rama Murthy, M. Srikanth, K. Viswanadh

Abstract: Spectrum Scarcity is a global problem across the world. This paper emphasizes on the fact that a global problem has to be dealt on global basis, not just locally by applying the principle of global cognitive radio,Global Opportunistic Remote Spectrum Access. The Future Internet and Internet of Things literally scare the communication system designer regarding the available bandwidth and spectrum r… ▽ More Spectrum Scarcity is a global problem across the world. This paper emphasizes on the fact that a global problem has to be dealt on global basis, not just locally by applying the principle of global cognitive radio,Global Opportunistic Remote Spectrum Access. The Future Internet and Internet of Things literally scare the communication system designer regarding the available bandwidth and spectrum resources. There is absolutely no scope to waste or under utilize the available resources. Hence the proposed idea of Global Cognitive Radio Concept can undoubtedly solve the resource problems in next Generation Communications. △ Less

Submitted 12 April, 2013; originally announced April 2013.

Comments: 9 pages, 4 figures

arXiv:1012.0084 [pdf]

doi 10.5121/ijcses.2010.1203

Survey on Various Gesture Recognition Techniques for Interfacing Machines Based on Ambient Intelligence

Authors: Harshith C, Karthik R. Shastry, Manoj Ravindran, M. V. V. N. S. Srikanth, Naveen Lakshmikhanth

Abstract: Gesture recognition is mainly apprehensive on analyzing the functionality of human wits. The main goal of gesture recognition is to create a system which can recognize specific human gestures and use them to convey information or for device control. Hand gestures provide a separate complementary modality to speech for expressing ones ideas. Information associated with hand gestures in a conversati… ▽ More Gesture recognition is mainly apprehensive on analyzing the functionality of human wits. The main goal of gesture recognition is to create a system which can recognize specific human gestures and use them to convey information or for device control. Hand gestures provide a separate complementary modality to speech for expressing ones ideas. Information associated with hand gestures in a conversation is degree,discourse structure, spatial and temporal structure. The approaches present can be mainly divided into Data-Glove Based and Vision Based approaches. An important face feature point is the nose tip. Since nose is the highest protruding point from the face. Besides that, it is not affected by facial expressions.Another important function of the nose is that it is able to indicate the head pose. Knowledge of the nose location will enable us to align an unknown 3D face with those in a face database. Eye detection is divided into eye position detection and eye contour detection. Existing works in eye detection can be classified into two major categories: traditional image-based passive approaches and the active IR based approaches. The former uses intensity and shape of eyes for detection and the latter works on the assumption that eyes have a reflection under near IR illumination and produce bright/dark pupil effect. The traditional methods can be broadly classified into three categories: template based methods,appearance based methods and feature based methods. The purpose of this paper is to compare various human Gesture recognition systems for interfacing machines directly to human wits without any corporeal media in an ambient environment. △ Less

Submitted 30 November, 2010; originally announced December 2010.

Comments: 12 PAGES

MSC Class: 68-02

Showing 1–12 of 12 results for author: Srikanth, M