-
Conversation Kernels: A Flexible Mechanism to Learn Relevant Context for Online Conversation Understanding
Authors:
Vibhor Agarwal,
Arjoo Gupta,
Suparna De,
Nishanth Sastry
Abstract:
Understanding online conversations has attracted research attention with the growth of social networks and online discussion forums. Content analysis of posts and replies in online conversations is difficult because each individual utterance is usually short and may implicitly refer to other posts within the same conversation. Thus, understanding individual posts requires capturing the conversatio…
▽ More
Understanding online conversations has attracted research attention with the growth of social networks and online discussion forums. Content analysis of posts and replies in online conversations is difficult because each individual utterance is usually short and may implicitly refer to other posts within the same conversation. Thus, understanding individual posts requires capturing the conversational context and dependencies between different parts of a conversation tree and then encoding the context dependencies between posts and comments/replies into the language model.
To this end, we propose a general-purpose mechanism to discover appropriate conversational context for various aspects about an online post in a conversation, such as whether it is informative, insightful, interesting or funny. Specifically, we design two families of Conversation Kernels, which explore different parts of the neighborhood of a post in the tree representing the conversation and through this, build relevant conversational context that is appropriate for each task being considered. We apply our developed method to conversations crawled from slashdot.org, which allows users to apply highly different labels to posts, such as 'insightful', 'funny', etc., and therefore provides an ideal experimental platform to study whether a framework such as Conversation Kernels is general-purpose and flexible enough to be adapted to disparately different conversation understanding tasks.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements
Authors:
Abhishek Trivedi,
Sourajit Mukherjee,
Rajat Kumar Singh,
Vani Agarwal,
Sriranjani Ramakrishnan,
Himanshu S. Bhatt
Abstract:
Extraction of transaction information from bank statements is required to assess one's financial well-being for credit rating and underwriting decisions. Unlike other financial documents such as tax forms or financial statements, extracting the transaction descriptions from bank statements can provide a comprehensive and recent view into the cash flows and spending patterns. With multiple variatio…
▽ More
Extraction of transaction information from bank statements is required to assess one's financial well-being for credit rating and underwriting decisions. Unlike other financial documents such as tax forms or financial statements, extracting the transaction descriptions from bank statements can provide a comprehensive and recent view into the cash flows and spending patterns. With multiple variations in layout and templates across several banks, extracting transactional level information from different table categories is an arduous task. Existing table structure recognition approaches produce sub optimal results for long, complex tables and are unable to capture all transactions accurately. This paper proposes TabSniper, a novel approach for efficient table detection, categorization and structure recognition from bank statements. The pipeline starts with detecting and categorizing tables of interest from the bank statements. The extracted table regions are then processed by the table structure recognition model followed by a post-processing module to transform the transactional data into a structured and standardised format. The detection and structure recognition architectures are based on DETR, fine-tuned with diverse bank statements along with additional feature enhancements. Results on challenging datasets demonstrate that TabSniper outperforms strong baselines and produces high-quality extraction of transaction information from bank and other financial documents across multiple layouts and templates.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
Rethinking Node Representation Interpretation through Relation Coherence
Authors:
Ying-Chun Lin,
Jennifer Neville,
Cassiano Becker,
Purvanshi Metha,
Nabiha Asghar,
Vipul Agarwal
Abstract:
Understanding node representations in graph-based models is crucial for uncovering biases ,diagnosing errors, and building trust in model decisions. However, previous work on explainable AI for node representations has primarily emphasized explanations (reasons for model predictions) rather than interpretations (mapping representations to understandable concepts). Furthermore, the limited research…
▽ More
Understanding node representations in graph-based models is crucial for uncovering biases ,diagnosing errors, and building trust in model decisions. However, previous work on explainable AI for node representations has primarily emphasized explanations (reasons for model predictions) rather than interpretations (mapping representations to understandable concepts). Furthermore, the limited research that focuses on interpretation lacks validation, and thus the reliability of such methods is unclear. We address this gap by proposing a novel interpretation method-Node Coherence Rate for Representation Interpretation (NCI)-which quantifies how well different node relations are captured in node representations. We also propose a novel method (IME) to evaluate the accuracy of different interpretation methods. Our experimental results demonstrate that NCI reduces the error of the previous best approach by an average of 39%. We then apply NCI to derive insights about the node representations produced by several graph-based methods and assess their quality in unsupervised settings.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Accelerated Relaxation Engines for Optimizing to Minimum Energy Path
Authors:
Sandra Liz Simon,
Nitin Kaistha,
Vishal Agarwal
Abstract:
In the last few decades, several novel algorithms have been designed for finding critical points on PES and the minimum energy paths connecting them. This has led to considerably improve our understanding of reaction mechanisms and kinetics of the underlying processes. These methods implicitly rely on computation of energy and forces on the PES, which are usually obtained by computationally demand…
▽ More
In the last few decades, several novel algorithms have been designed for finding critical points on PES and the minimum energy paths connecting them. This has led to considerably improve our understanding of reaction mechanisms and kinetics of the underlying processes. These methods implicitly rely on computation of energy and forces on the PES, which are usually obtained by computationally demanding wave-function or density-function based ab initio methods. To mitigate the computational cost, efficient optimization algorithms are needed. Herein, we present two new optimization algorithms: adaptively accelerated relaxation engine (AARE), an enhanced molecular dynamics (MD) scheme, and accelerated conjugate-gradient method (Acc-CG), an improved version of the traditional conjugate gradient (CG) algorithm. We show the efficacy of these algorithms for unconstrained optimization on 2D and 4D test functions. Additionally, we also show the efficacy of these algorithms for optimizing an elastic band of images to the minimum energy path on two analytical potentials (LEPS-I and LEPS-II) and for HCN/CNH isomerization reaction. In all cases, we find that the new algorithms outperforms the standard and popular fast inertial relaxation engine (FIRE).
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
MedHalu: Hallucinations in Responses to Healthcare Queries by Large Language Models
Authors:
Vibhor Agarwal,
Yiqiao Jin,
Mohit Chandra,
Munmun De Choudhury,
Srijan Kumar,
Nishanth Sastry
Abstract:
The remarkable capabilities of large language models (LLMs) in language understanding and generation have not rendered them immune to hallucinations. LLMs can still generate plausible-sounding but factually incorrect or fabricated information. As LLM-empowered chatbots become popular, laypeople may frequently ask health-related queries and risk falling victim to these LLM hallucinations, resulting…
▽ More
The remarkable capabilities of large language models (LLMs) in language understanding and generation have not rendered them immune to hallucinations. LLMs can still generate plausible-sounding but factually incorrect or fabricated information. As LLM-empowered chatbots become popular, laypeople may frequently ask health-related queries and risk falling victim to these LLM hallucinations, resulting in various societal and healthcare implications. In this work, we conduct a pioneering study of hallucinations in LLM-generated responses to real-world healthcare queries from patients. We propose MedHalu, a carefully crafted first-of-its-kind medical hallucination dataset with a diverse range of health-related topics and the corresponding hallucinated responses from LLMs with labeled hallucination types and hallucinated text spans. We also introduce MedHaluDetect framework to evaluate capabilities of various LLMs in detecting hallucinations. We also employ three groups of evaluators -- medical experts, LLMs, and laypeople -- to study who are more vulnerable to these medical hallucinations. We find that LLMs are much worse than the experts. They also perform no better than laypeople and even worse in few cases in detecting hallucinations. To fill this gap, we propose expert-in-the-loop approach to improve hallucination detection through LLMs by infusing expert reasoning. We observe significant performance gains for all the LLMs with an average macro-F1 improvement of 6.3 percentage points for GPT-4.
△ Less
Submitted 28 September, 2024;
originally announced September 2024.
-
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation
Authors:
Archana Swaminathan,
Anubhav Gupta,
Kamal Gupta,
Shishira R. Maiya,
Vatsal Agarwal,
Abhinav Shrivastava
Abstract:
Neural Radiance Fields (NeRFs) have revolutionized the reconstruction of static scenes and objects in 3D, offering unprecedented quality. However, extending NeRFs to model dynamic objects or object articulations remains a challenging problem. Previous works have tackled this issue by focusing on part-level reconstruction and motion estimation for objects, but they often rely on heuristics regardin…
▽ More
Neural Radiance Fields (NeRFs) have revolutionized the reconstruction of static scenes and objects in 3D, offering unprecedented quality. However, extending NeRFs to model dynamic objects or object articulations remains a challenging problem. Previous works have tackled this issue by focusing on part-level reconstruction and motion estimation for objects, but they often rely on heuristics regarding the number of moving parts or object categories, which can limit their practical use. In this work, we introduce LEIA, a novel approach for representing dynamic 3D objects. Our method involves observing the object at distinct time steps or "states" and conditioning a hypernetwork on the current state, using this to parameterize our NeRF. This approach allows us to learn a view-invariant latent representation for each state. We further demonstrate that by interpolating between these states, we can generate novel articulation configurations in 3D space that were previously unseen. Our experimental results highlight the effectiveness of our method in articulating objects in a manner that is independent of the viewing angle and joint configuration. Notably, our approach outperforms previous methods that rely on motion information for articulation registration.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
CodeMirage: Hallucinations in Code Generated by Large Language Models
Authors:
Vibhor Agarwal,
Yulong Pei,
Salwa Alamir,
Xiaomo Liu
Abstract:
Large Language Models (LLMs) have shown promising potentials in program generation and no-code automation. However, LLMs are prone to generate hallucinations, i.e., they generate text which sounds plausible but is incorrect. Although there has been a recent surge in research on LLM hallucinations for text generation, similar hallucination phenomenon can happen in code generation. Sometimes the gen…
▽ More
Large Language Models (LLMs) have shown promising potentials in program generation and no-code automation. However, LLMs are prone to generate hallucinations, i.e., they generate text which sounds plausible but is incorrect. Although there has been a recent surge in research on LLM hallucinations for text generation, similar hallucination phenomenon can happen in code generation. Sometimes the generated code can have syntactical or logical errors as well as more advanced issues like security vulnerabilities, memory leaks, etc. Given the wide adaptation of LLMs to enhance efficiency in code generation and development in general, it becomes imperative to investigate hallucinations in code generation. To the best of our knowledge, this is the first attempt at studying hallucinations in the code generated by LLMs. We start by introducing the code hallucination definition and a comprehensive taxonomy of code hallucination types. We propose the first benchmark CodeMirage dataset for code hallucinations. The benchmark contains 1,137 GPT-3.5 generated hallucinated code snippets for Python programming problems from two base datasets - HumanEval and MBPP. We then propose the methodology for code hallucination detection and experiment with open source LLMs such as CodeLLaMA as well as OpenAI's GPT-3.5 and GPT-4 models using one-shot prompt. We find that GPT-4 performs the best on HumanEval dataset and gives comparable results to the fine-tuned CodeBERT baseline on MBPP dataset. Towards the end, we discuss various mitigation strategies for code hallucinations and conclude our work.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Generation and De-Identification of Indian Clinical Discharge Summaries using LLMs
Authors:
Sanjeet Singh,
Shreya Gupta,
Niralee Gupta,
Naimish Sharma,
Lokesh Srivastava,
Vibhu Agarwal,
Ashutosh Modi
Abstract:
The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the lett…
▽ More
The consequences of a healthcare data breach can be devastating for the patients, providers, and payers. The average financial impact of a data breach in recent months has been estimated to be close to USD 10 million. This is especially significant for healthcare organizations in India that are managing rapid digitization while still establishing data governance procedures that align with the letter and spirit of the law. Computer-based systems for de-identification of personal information are vulnerable to data drift, often rendering them ineffective in cross-institution settings. Therefore, a rigorous assessment of existing de-identification against local health datasets is imperative to support the safe adoption of digital health initiatives in India. Using a small set of de-identified patient discharge summaries provided by an Indian healthcare institution, in this paper, we report the nominal performance of de-identification algorithms (based on language models) trained on publicly available non-Indian datasets, pointing towards a lack of cross-institutional generalization. Similarly, experimentation with off-the-shelf de-identification systems reveals potential risks associated with the approach. To overcome data scarcity, we explore generating synthetic clinical reports (using publicly available and Indian summaries) by performing in-context learning over Large Language Models (LLMs). Our experiments demonstrate the use of generated reports as an effective strategy for creating high-performing de-identification systems with good generalization capabilities.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
What's in the Flow? Exploiting Temporal Motion Cues for Unsupervised Generic Event Boundary Detection
Authors:
Sourabh Vasant Gothe,
Vibhav Agarwal,
Sourav Ghosh,
Jayesh Rajkumar Vachhani,
Pranay Kashyap,
Barath Raj Kandur Raja
Abstract:
Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural…
▽ More
Generic Event Boundary Detection (GEBD) task aims to recognize generic, taxonomy-free boundaries that segment a video into meaningful events. Current methods typically involve a neural model trained on a large volume of data, demanding substantial computational power and storage space. We explore two pivotal questions pertaining to GEBD: Can non-parametric algorithms outperform unsupervised neural methods? Does motion information alone suffice for high performance? This inquiry drives us to algorithmically harness motion cues for identifying generic event boundaries in videos. In this work, we propose FlowGEBD, a non-parametric, unsupervised technique for GEBD. Our approach entails two algorithms utilizing optical flow: (i) Pixel Tracking and (ii) Flow Normalization. By conducting thorough experimentation on the challenging Kinetics-GEBD and TAPOS datasets, our results establish FlowGEBD as the new state-of-the-art (SOTA) among unsupervised methods. FlowGEBD exceeds the neural models on the Kinetics-GEBD dataset by obtaining an [email protected] score of 0.713 with an absolute gain of 31.7% compared to the unsupervised baseline and achieves an average F1 score of 0.623 on the TAPOS validation dataset.
△ Less
Submitted 15 February, 2024;
originally announced April 2024.
-
Data Science In Olfaction
Authors:
Vivek Agarwal,
Joshua Harvey,
Dmitry Rinberg,
Vasant Dhar
Abstract:
Advances in neural sensing technology are making it possible to observe the olfactory process in great detail. In this paper, we conceptualize smell from a Data Science and AI perspective, that relates the properties of odorants to how they are sensed and analyzed in the olfactory system from the nose to the brain. Drawing distinctions to color vision, we argue that smell presents unique measureme…
▽ More
Advances in neural sensing technology are making it possible to observe the olfactory process in great detail. In this paper, we conceptualize smell from a Data Science and AI perspective, that relates the properties of odorants to how they are sensed and analyzed in the olfactory system from the nose to the brain. Drawing distinctions to color vision, we argue that smell presents unique measurement challenges, including the complexity of stimuli, the high dimensionality of the sensory apparatus, as well as what constitutes ground truth. In the face of these challenges, we argue for the centrality of odorant-receptor interactions in developing a theory of olfaction. Such a theory is likely to find widespread industrial applications, and enhance our understanding of smell, and in the longer-term, how it relates to other senses and language. As an initial use case of the data, we present results using machine learning-based classification of neural responses to odors as they are recorded in the mouse olfactory bulb with calcium imaging.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Decentralised Moderation for Interoperable Social Networks: A Conversation-based Approach for Pleroma and the Fediverse
Authors:
Vibhor Agarwal,
Aravindh Raman,
Nishanth Sastry,
Ahmed M. Abdelmoniem,
Gareth Tyson,
Ignacio Castro
Abstract:
The recent development of decentralised and interoperable social networks (such as the "fediverse") creates new challenges for content moderators. This is because millions of posts generated on one server can easily "spread" to another, even if the recipient server has very different moderation policies. An obvious solution would be to leverage moderation tools to automatically tag (and filter) po…
▽ More
The recent development of decentralised and interoperable social networks (such as the "fediverse") creates new challenges for content moderators. This is because millions of posts generated on one server can easily "spread" to another, even if the recipient server has very different moderation policies. An obvious solution would be to leverage moderation tools to automatically tag (and filter) posts that contravene moderation policies, e.g. related to toxic speech. Recent work has exploited the conversational context of a post to improve this automatic tagging, e.g. using the replies to a post to help classify if it contains toxic speech. This has shown particular potential in environments with large training sets that contain complete conversations. This, however, creates challenges in a decentralised context, as a single conversation may be fragmented across multiple servers. Thus, each server only has a partial view of an entire conversation because conversations are often federated across servers in a non-synchronized fashion. To address this, we propose a decentralised conversation-aware content moderation approach suitable for the fediverse. Our approach employs a graph deep learning model (GraphNLI) trained locally on each server. The model exploits local data to train a model that combines post and conversational information captured through random walks to detect toxicity. We evaluate our approach with data from Pleroma, a major decentralised and interoperable micro-blogging network containing 2 million conversations. Our model effectively detects toxicity on larger instances, exclusively trained using their local post information (0.8837 macro-F1). Our approach has considerable scope to improve moderation in decentralised and interoperable social networks such as Pleroma or Mastodon.
△ Less
Submitted 16 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
TrICy: Trigger-guided Data-to-text Generation with Intent aware Attention-Copy
Authors:
Vibhav Agarwal,
Sourav Ghosh,
Harichandana BSS,
Himanshu Arora,
Barath Raj Kandur Raja
Abstract:
Data-to-text (D2T) generation is a crucial task in many natural language understanding (NLU) applications and forms the foundation of task-oriented dialog systems. In the context of conversational AI solutions that can work directly with local data on the user's device, architectures utilizing large pre-trained language models (PLMs) are impractical for on-device deployment due to a high memory fo…
▽ More
Data-to-text (D2T) generation is a crucial task in many natural language understanding (NLU) applications and forms the foundation of task-oriented dialog systems. In the context of conversational AI solutions that can work directly with local data on the user's device, architectures utilizing large pre-trained language models (PLMs) are impractical for on-device deployment due to a high memory footprint. To this end, we propose TrICy, a novel lightweight framework for an enhanced D2T task that generates text sequences based on the intent in context and may further be guided by user-provided triggers. We leverage an attention-copy mechanism to predict out-of-vocabulary (OOV) words accurately. Performance analyses on E2E NLG dataset (BLEU: 66.43%, ROUGE-L: 70.14%), WebNLG dataset (BLEU: Seen 64.08%, Unseen 52.35%), and our Custom dataset related to text messaging applications, showcase our architecture's effectiveness. Moreover, we show that by leveraging an optional trigger input, data-to-text generation quality increases significantly and achieves the new SOTA score of 69.29% BLEU for E2E NLG. Furthermore, our analyses show that TrICy achieves at least 24% and 3% improvement in BLEU and METEOR respectively over LLMs like GPT-3, ChatGPT, and Llama 2. We also demonstrate that in some scenarios, performance improvement due to triggers is observed even when they are absent in training.
△ Less
Submitted 25 January, 2024;
originally announced February 2024.
-
"Which LLM should I use?": Evaluating LLMs for tasks performed by Undergraduate Computer Science Students
Authors:
Vibhor Agarwal,
Madhav Krishan Garg,
Sahiti Dharmavaram,
Dhruv Kumar
Abstract:
This study evaluates the effectiveness of various large language models (LLMs) in performing tasks common among undergraduate computer science students. Although a number of research studies in the computing education community have explored the possibility of using LLMs for a variety of tasks, there is a lack of comprehensive research comparing different LLMs and evaluating which LLMs are most ef…
▽ More
This study evaluates the effectiveness of various large language models (LLMs) in performing tasks common among undergraduate computer science students. Although a number of research studies in the computing education community have explored the possibility of using LLMs for a variety of tasks, there is a lack of comprehensive research comparing different LLMs and evaluating which LLMs are most effective for different tasks. Our research systematically assesses some of the publicly available LLMs such as Google Bard, ChatGPT(3.5), GitHub Copilot Chat, and Microsoft Copilot across diverse tasks commonly encountered by undergraduate computer science students in India. These tasks include code explanation and documentation, solving class assignments, technical interview preparation, learning new concepts and frameworks, and email writing. Evaluation for these tasks was carried out by pre-final year and final year undergraduate computer science students and provides insights into the models' strengths and limitations. This study aims to guide students as well as instructors in selecting suitable LLMs for any specific task and offers valuable insights on how LLMs can be used constructively by students and instructors.
△ Less
Submitted 3 April, 2024; v1 submitted 22 January, 2024;
originally announced February 2024.
-
Do text-free diffusion models learn discriminative visual representations?
Authors:
Soumik Mukhopadhyay,
Matthew Gwilliam,
Yosuke Yamaguchi,
Vatsal Agarwal,
Namitha Padmanabhan,
Archana Swaminathan,
Tianyi Zhou,
Jun Ohya,
Abhinav Shrivastava
Abstract:
While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We identify diffusion models, a state-of-the-art method for generative tasks, as a prime candidate. Such models involve training a U-Net to iteratively predict and re…
▽ More
While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We identify diffusion models, a state-of-the-art method for generative tasks, as a prime candidate. Such models involve training a U-Net to iteratively predict and remove noise, and the resulting model can synthesize high-fidelity, diverse, novel images. We find that the intermediate feature maps of the U-Net are diverse, discriminative feature representations. We propose a novel attention mechanism for pooling feature maps and further leverage this mechanism as DifFormer, a transformer feature fusion of features from different diffusion U-Net blocks and noise steps. We also develop DifFeed, a novel feedback mechanism tailored to diffusion. We find that diffusion models are better than GANs, and, with our fusion and feedback mechanisms, can compete with state-of-the-art unsupervised image representation learning methods for discriminative tasks - image classification with full and semi-supervision, transfer for fine-grained classification, object detection and segmentation, and semantic segmentation. Our project website (https://mgwillia.github.io/diffssl/) and code (https://github.com/soumik-kanad/diffssl) are available publicly.
△ Less
Submitted 24 September, 2024; v1 submitted 29 November, 2023;
originally announced November 2023.
-
GASCOM: Graph-based Attentive Semantic Context Modeling for Online Conversation Understanding
Authors:
Vibhor Agarwal,
Yu Chen,
Nishanth Sastry
Abstract:
Online conversation understanding is an important yet challenging NLP problem which has many useful applications (e.g., hate speech detection). However, online conversations typically unfold over a series of posts and replies to those posts, forming a tree structure within which individual posts may refer to semantic context from higher up the tree. Such semantic cross-referencing makes it difficu…
▽ More
Online conversation understanding is an important yet challenging NLP problem which has many useful applications (e.g., hate speech detection). However, online conversations typically unfold over a series of posts and replies to those posts, forming a tree structure within which individual posts may refer to semantic context from higher up the tree. Such semantic cross-referencing makes it difficult to understand a single post by itself; yet considering the entire conversation tree is not only difficult to scale but can also be misleading as a single conversation may have several distinct threads or points, not all of which are relevant to the post being considered. In this paper, we propose a Graph-based Attentive Semantic COntext Modeling (GASCOM) framework for online conversation understanding. Specifically, we design two novel algorithms that utilise both the graph structure of the online conversation as well as the semantic information from individual posts for retrieving relevant context nodes from the whole conversation. We further design a token-level multi-head graph attention mechanism to pay different attentions to different tokens from different selected context utterances for fine-grained conversation context modeling. Using this semantic conversational context, we re-examine two well-studied problems: polarity prediction and hate speech detection. Our proposed framework significantly outperforms state-of-the-art methods on both tasks, improving macro-F1 scores by 4.5% for polarity prediction and by 5% for hate speech detection. The GASCOM context weights also enhance interpretability.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
HateRephrase: Zero- and Few-Shot Reduction of Hate Intensity in Online Posts using Large Language Models
Authors:
Vibhor Agarwal,
Yu Chen,
Nishanth Sastry
Abstract:
Hate speech has become pervasive in today's digital age. Although there has been considerable research to detect hate speech or generate counter speech to combat hateful views, these approaches still cannot completely eliminate the potential harmful societal consequences of hate speech -- hate speech, even when detected, can often not be taken down or is often not taken down enough; and hate speec…
▽ More
Hate speech has become pervasive in today's digital age. Although there has been considerable research to detect hate speech or generate counter speech to combat hateful views, these approaches still cannot completely eliminate the potential harmful societal consequences of hate speech -- hate speech, even when detected, can often not be taken down or is often not taken down enough; and hate speech unfortunately spreads quickly, often much faster than any generated counter speech.
This paper investigates a relatively new yet simple and effective approach of suggesting a rephrasing of potential hate speech content even before the post is made. We show that Large Language Models (LLMs) perform well on this task, outperforming state-of-the-art baselines such as BART-Detox. We develop 4 different prompts based on task description, hate definition, few-shot demonstrations and chain-of-thoughts for comprehensive experiments and conduct experiments on open-source LLMs such as LLaMA-1, LLaMA-2 chat, Vicuna as well as OpenAI's GPT-3.5. We propose various evaluation metrics to measure the efficacy of the generated text and ensure the generated text has reduced hate intensity without drastically changing the semantic meaning of the original text.
We find that LLMs with a few-shot demonstrations prompt work the best in generating acceptable hate-rephrased text with semantic meaning similar to the original text. Overall, we find that GPT-3.5 outperforms the baseline and open-source models for all the different kinds of prompts. We also perform human evaluations and interestingly, find that the rephrasings generated by GPT-3.5 outperform even the human-generated ground-truth rephrasings in the dataset. We also conduct detailed ablation studies to investigate why LLMs work satisfactorily on this task and conduct a failure analysis to understand the gaps.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
AI in the Gray: Exploring Moderation Policies in Dialogic Large Language Models vs. Human Answers in Controversial Topics
Authors:
Vahid Ghafouri,
Vibhor Agarwal,
Yong Zhang,
Nishanth Sastry,
Jose Such,
Guillermo Suarez-Tangil
Abstract:
The introduction of ChatGPT and the subsequent improvement of Large Language Models (LLMs) have prompted more and more individuals to turn to the use of ChatBots, both for information and assistance with decision-making. However, the information the user is after is often not formulated by these ChatBots objectively enough to be provided with a definite, globally accepted answer.
Controversial t…
▽ More
The introduction of ChatGPT and the subsequent improvement of Large Language Models (LLMs) have prompted more and more individuals to turn to the use of ChatBots, both for information and assistance with decision-making. However, the information the user is after is often not formulated by these ChatBots objectively enough to be provided with a definite, globally accepted answer.
Controversial topics, such as "religion", "gender identity", "freedom of speech", and "equality", among others, can be a source of conflict as partisan or biased answers can reinforce preconceived notions or promote disinformation. By exposing ChatGPT to such debatable questions, we aim to understand its level of awareness and if existing models are subject to socio-political and/or economic biases. We also aim to explore how AI-generated answers compare to human ones. For exploring this, we use a dataset of a social media platform created for the purpose of debating human-generated claims on polemic subjects among users, dubbed Kialo.
Our results show that while previous versions of ChatGPT have had important issues with controversial topics, more recent versions of ChatGPT (gpt-3.5-turbo) are no longer manifesting significant explicit biases in several knowledge areas. In particular, it is well-moderated regarding economic aspects. However, it still maintains degrees of implicit libertarian leaning toward right-winged ideals which suggest the need for increased moderation from the socio-political point of view. In terms of domain knowledge on controversial topics, with the exception of the "Philosophical" category, ChatGPT is performing well in keeping up with the collective human level of knowledge. Finally, we see that sources of Bing AI have slightly more tendency to the center when compared to human answers. All the analyses we make are generalizable to other types of biases and domains.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Diffusion Models Beat GANs on Image Classification
Authors:
Soumik Mukhopadhyay,
Matthew Gwilliam,
Vatsal Agarwal,
Namitha Padmanabhan,
Archana Swaminathan,
Srinidhi Hegde,
Tianyi Zhou,
Abhinav Shrivastava
Abstract:
While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image…
▽ More
While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which uses a single pre-training stage to address both families of tasks simultaneously. We identify diffusion models as a prime candidate. Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc. Such models involve training a U-Net to iteratively predict and remove noise, and the resulting model can synthesize high fidelity, diverse, novel images. The U-Net architecture, as a convolution-based architecture, generates a diverse set of feature representations in the form of intermediate feature maps. We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification. We explore optimal methods for extracting and using these embeddings for classification tasks, demonstrating promising results on the ImageNet classification task. We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods such as BigBiGAN for classification tasks. We investigate diffusion models in the transfer learning regime, examining their performance on several fine-grained visual classification datasets. We compare these embeddings to those generated by competing architectures and pre-trainings for classification tasks.
△ Less
Submitted 17 July, 2023;
originally announced July 2023.
-
A clustering and graph deep learning-based framework for COVID-19 drug repurposing
Authors:
Chaarvi Bansal,
Rohitash Chandra,
Vinti Agarwal,
P. R. Deepa
Abstract:
Drug repurposing (or repositioning) is the process of finding new therapeutic uses for drugs already approved by drug regulatory authorities (e.g., the Food and Drug Administration (FDA) and Therapeutic Goods Administration (TGA)) for other diseases. This involves analyzing the interactions between different biological entities, such as drug targets (genes/proteins and biological pathways) and dru…
▽ More
Drug repurposing (or repositioning) is the process of finding new therapeutic uses for drugs already approved by drug regulatory authorities (e.g., the Food and Drug Administration (FDA) and Therapeutic Goods Administration (TGA)) for other diseases. This involves analyzing the interactions between different biological entities, such as drug targets (genes/proteins and biological pathways) and drug properties, to discover novel drug-target or drug-disease relations. Artificial intelligence methods such as machine learning and deep learning have successfully analyzed complex heterogeneous data in the biomedical domain and have also been used for drug repurposing. This study presents a novel unsupervised machine learning framework that utilizes a graph-based autoencoder for multi-feature type clustering on heterogeneous drug data. The dataset consists of 438 drugs, of which 224 are under clinical trials for COVID-19 (category A). The rest are systematically filtered to ensure the safety and efficacy of the treatment (category B). The framework solely relies on reported drug data, including its pharmacological properties, chemical/physical properties, interaction with the host, and efficacy in different publicly available COVID-19 assays. Our machine-learning framework reveals three clusters of interest and provides recommendations featuring the top 15 drugs for COVID-19 drug repurposing, which were shortlisted based on the predicted clusters that were dominated by category A drugs. The anti-COVID efficacy of the drugs should be verified by experimental studies. Our framework can be extended to support other datasets and drug repurposing studies, given open-source code and data availability.
△ Less
Submitted 24 June, 2023;
originally announced June 2023.
-
Suspicious Vehicle Detection Using Licence Plate Detection And Facial Feature Recognition
Authors:
Vrinda Agarwal,
Aaron George Pichappa,
Manideep Ramisetty,
Bala Murugan MS,
Manoj kumar Rajagopal
Abstract:
With the increasing need to strengthen vehicle safety and detection, the availability of pre-existing methods of catching criminals and identifying vehicles manually through the various traffic surveillance cameras is not only time-consuming but also inefficient. With the advancement of technology in every field the use of real-time traffic surveillance models will help facilitate an easy approach…
▽ More
With the increasing need to strengthen vehicle safety and detection, the availability of pre-existing methods of catching criminals and identifying vehicles manually through the various traffic surveillance cameras is not only time-consuming but also inefficient. With the advancement of technology in every field the use of real-time traffic surveillance models will help facilitate an easy approach. Keeping this in mind, the main focus of our paper is to develop a combined face recognition and number plate recognition model to ensure vehicle safety and real-time tracking of running-away criminals and stolen vehicles.
△ Less
Submitted 18 April, 2023;
originally announced April 2023.
-
AnnoBERT: Effectively Representing Multiple Annotators' Label Choices to Improve Hate Speech Detection
Authors:
Wenjie Yin,
Vibhor Agarwal,
Aiqi Jiang,
Arkaitz Zubiaga,
Nishanth Sastry
Abstract:
Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integra…
▽ More
Supervised approaches generally rely on majority-based labels. However, it is hard to achieve high agreement among annotators in subjective tasks such as hate speech detection. Existing neural network models principally regard labels as categorical variables, while ignoring the semantic information in diverse label texts. In this paper, we propose AnnoBERT, a first-of-its-kind architecture integrating annotator characteristics and label text with a transformer-based model to detect hate speech, with unique representations based on each annotator's characteristics via Collaborative Topic Regression (CTR) and integrate label text to enrich textual representations. During training, the model associates annotators with their label choices given a piece of text; during evaluation, when label information is not available, the model predicts the aggregated label given by the participating annotators by utilising the learnt association. The proposed approach displayed an advantage in detecting hate speech, especially in the minority class and edge cases with annotator disagreement. Improvement in the overall performance is the largest when the dataset is more label-imbalanced, suggesting its practical value in identifying real-world hate speech, as the volume of hate speech in-the-wild is extremely small on social media, when compared with normal (non-hate) speech. Through ablation studies, we show the relative contributions of annotator embeddings and label text to the model performance, and tested a range of alternative annotator embeddings and label text combinations.
△ Less
Submitted 10 January, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
A Graph-Based Context-Aware Model to Understand Online Conversations
Authors:
Vibhor Agarwal,
Anthony P. Young,
Sagar Joglekar,
Nishanth Sastry
Abstract:
Online forums that allow for participatory engagement between users have been transformative for the public discussion of many important issues. However, such conversations can sometimes escalate into full-blown exchanges of hate and misinformation. Existing approaches in natural language processing (NLP), such as deep learning models for classification tasks, use as inputs only a single comment o…
▽ More
Online forums that allow for participatory engagement between users have been transformative for the public discussion of many important issues. However, such conversations can sometimes escalate into full-blown exchanges of hate and misinformation. Existing approaches in natural language processing (NLP), such as deep learning models for classification tasks, use as inputs only a single comment or a pair of comments depending upon whether the task concerns the inference of properties of the individual comments or the replies between pairs of comments, respectively. But in online conversations, comments and replies may be based on external context beyond the immediately relevant information that is input to the model. Therefore, being aware of the conversations' surrounding contexts should improve the model's performance for the inference task at hand.
We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walks to incorporate the wider context of a conversation in a principled manner. Specifically, a graph walk starts from a given comment and samples "nearby" comments in the same or parallel conversation threads, which results in additional embeddings that are aggregated together with the initial comment's embedding. We then use these enriched embeddings for downstream NLP prediction tasks that are important for online conversations. We evaluate GraphNLI on two such tasks - polarity prediction and misogynistic hate speech detection - and found that our model consistently outperforms all relevant baselines for both tasks. Specifically, GraphNLI with a biased root-seeking random walk performs with a macro-F1 score of 3 and 6 percentage points better than the best-performing BERT-based baselines for the polarity prediction and hate speech detection tasks, respectively.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
The Desorption Rate at Liquid-Solid Interface
Authors:
Krishna Jaiswal,
Horia Metiu,
Vishal Agarwal
Abstract:
We use a simple generic model to study the desorption of atoms from a solid surface in contact with a liquid, by using a combination of Monte Carlo and molecular dynamics simulations. The behavior of the system depends on two parameters: the strength $ε_{LS}$ of the solid-liquid interaction energy and the strength $ε_{LL}$ of the liquid-liquid interaction energy. The contact with the solid surface…
▽ More
We use a simple generic model to study the desorption of atoms from a solid surface in contact with a liquid, by using a combination of Monte Carlo and molecular dynamics simulations. The behavior of the system depends on two parameters: the strength $ε_{LS}$ of the solid-liquid interaction energy and the strength $ε_{LL}$ of the liquid-liquid interaction energy. The contact with the solid surface modifies the structure of the adjacent liquid. Depending on the magnitude of the parameters $ε_{LS}$ and $ε_{LL}$ the density of the liquid oscillates with the distance from the surface for as far as five atomic layers. For other values of the parameters the density of the liquid near the surface is much lower than that of the bulk liquid, a process sometimes called dewetting. To describe the desorption rate we have determined the number $N(t)$ of atoms that were adsorbed initially and are still adsorbed at time $t$. The average of this quantity decays exponentially and this allows us to define a desorption rate coefficient $k_{d}$. We found that $k_{d}$ satisfies the Arrhenius relation. The logarithm of the pre-exponential is a linear function of the activation energy (the so-called compensation effect). We have also examined two approximate mean-field like theories of the rate constant: one calculates the activation free energy for desorption; the other uses transition state theory applied to the potential of mean force. Both methods fail to reproduce the exact desorption rate constant $k_{d}$. We show that this failure is due to the presence of substantial recrossing of the barrier (which introduces errors in transition state theory) and the presence of large fluctuations in the desorption rate of individual molecules whose effect is not captured by mean-field theories.
△ Less
Submitted 15 November, 2022;
originally announced November 2022.
-
Bounding Box Priors for Cell Detection with Point Annotations
Authors:
Hari Om Aggrawal,
Dipam Goswami,
Vinti Agarwal
Abstract:
The size of an individual cell type, such as a red blood cell, does not vary much among humans. We use this knowledge as a prior for classifying and detecting cells in images with only a few ground truth bounding box annotations, while most of the cells are annotated with points. This setting leads to weakly semi-supervised learning. We propose replacing points with either stochastic (ST) boxes or…
▽ More
The size of an individual cell type, such as a red blood cell, does not vary much among humans. We use this knowledge as a prior for classifying and detecting cells in images with only a few ground truth bounding box annotations, while most of the cells are annotated with points. This setting leads to weakly semi-supervised learning. We propose replacing points with either stochastic (ST) boxes or bounding box predictions during the training process. The proposed "mean-IOU" ST box maximizes the overlap with all the boxes belonging to the sample space with a class-specific approximated prior probability distribution of bounding boxes. Our method trains with both box- and point-labelled images in conjunction, unlike the existing methods, which train first with box- and then point-labelled images. In the most challenging setting, when only 5% images are box-labelled, quantitative experiments on a urine dataset show that our one-stage method outperforms two-stage methods by 5.56 mAP. Furthermore, we suggest an approach that partially answers "how many box-labelled annotations are necessary?" before training a machine learning model.
△ Less
Submitted 11 November, 2022;
originally announced November 2022.
-
Unsupervised machine learning framework for discriminating major variants of concern during COVID-19
Authors:
Rohitash Chandra,
Chaarvi Bansal,
Mingyue Kang,
Tom Blau,
Vinti Agarwal,
Pranjal Singh,
Laurence O. W. Wilson,
Seshadri Vasan
Abstract:
Due to the high mutation rate of the virus, the COVID-19 pandemic evolved rapidly. Certain variants of the virus, such as Delta and Omicron, emerged with altered viral properties leading to severe transmission and death rates. These variants burdened the medical systems worldwide with a major impact to travel, productivity, and the world economy. Unsupervised machine learning methods have the abil…
▽ More
Due to the high mutation rate of the virus, the COVID-19 pandemic evolved rapidly. Certain variants of the virus, such as Delta and Omicron, emerged with altered viral properties leading to severe transmission and death rates. These variants burdened the medical systems worldwide with a major impact to travel, productivity, and the world economy. Unsupervised machine learning methods have the ability to compress, characterize, and visualize unlabelled data. This paper presents a framework that utilizes unsupervised machine learning methods to discriminate and visualize the associations between major COVID-19 variants based on their genome sequences. These methods comprise a combination of selected dimensionality reduction and clustering techniques. The framework processes the RNA sequences by performing a k-mer analysis on the data and further visualises and compares the results using selected dimensionality reduction methods that include principal component analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold approximation projection (UMAP). Our framework also employs agglomerative hierarchical clustering to visualize the mutational differences among major variants of concern and country-wise mutational differences for selected variants (Delta and Omicron) using dendrograms. We also provide country-wise mutational differences for selected variants via dendrograms. We find that the proposed framework can effectively distinguish between the major variants and has the potential to identify emerging variants in the future.
△ Less
Submitted 25 May, 2023; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Broken News: Making Newspapers Accessible to Print-Impaired
Authors:
Vishal Agarwal,
Tanuja Ganu,
Saikat Guha
Abstract:
Accessing daily news content still remains a big challenge for people with print-impairment including blind and low-vision due to opacity of printed content and hindrance from online sources. In this paper, we present our approach for digitization of print newspaper into an accessible file format such as HTML. We use an ensemble of instance segmentation and detection framework for newspaper layout…
▽ More
Accessing daily news content still remains a big challenge for people with print-impairment including blind and low-vision due to opacity of printed content and hindrance from online sources. In this paper, we present our approach for digitization of print newspaper into an accessible file format such as HTML. We use an ensemble of instance segmentation and detection framework for newspaper layout analysis and then OCR to recognize text elements such as headline and article text. Additionally, we propose EdgeMask loss function for Mask-RCNN framework to improve segmentation mask boundary and hence accuracy of downstream OCR task. Empirically, we show that our proposed loss function reduces the Word Error Rate (WER) of news article text by 32.5 %.
△ Less
Submitted 23 June, 2022; v1 submitted 21 June, 2022;
originally announced June 2022.
-
Neural Language Taskonomy: Which NLP Tasks are the most Predictive of fMRI Brain Activity?
Authors:
Subba Reddy Oota,
Jashn Arora,
Veeral Agarwal,
Mounika Marreddy,
Manish Gupta,
Bapi Raju Surampudi
Abstract:
Several popular Transformer based language models have been found to be successful for text-driven brain encoding. However, existing literature leverages only pretrained text Transformer models and has not explored the efficacy of task-specific learned Transformer representations. In this work, we explore transfer learning from representations learned for ten popular natural language processing ta…
▽ More
Several popular Transformer based language models have been found to be successful for text-driven brain encoding. However, existing literature leverages only pretrained text Transformer models and has not explored the efficacy of task-specific learned Transformer representations. In this work, we explore transfer learning from representations learned for ten popular natural language processing tasks (two syntactic and eight semantic) for predicting brain responses from two diverse datasets: Pereira (subjects reading sentences from paragraphs) and Narratives (subjects listening to the spoken stories). Encoding models based on task features are used to predict activity in different regions across the whole brain. Features from coreference resolution, NER, and shallow syntax parsing explain greater variance for the reading activity. On the other hand, for the listening activity, tasks such as paraphrase generation, summarization, and natural language inference show better encoding performance. Experiments across all 10 task representations provide the following cognitive insights: (i) language left hemisphere has higher predictive brain activity versus language right hemisphere, (ii) posterior medial cortex, temporo-parieto-occipital junction, dorsal frontal lobe have higher correlation versus early auditory and auditory association cortex, (iii) syntactic and semantic tasks display a good predictive performance across brain regions for reading and listening stimuli resp.
△ Less
Submitted 3 May, 2022;
originally announced May 2022.
-
"Way back then": A Data-driven View of 25+ years of Web Evolution
Authors:
Vibhor Agarwal,
Nishanth Sastry
Abstract:
Since the inception of the first web page three decades back, the Web has evolved considerably, from static HTML pages in the beginning to the dynamic web pages of today, from mainly the text-based pages of the 1990s to today's multimedia rich pages, etc. Although much of this is known anecdotally, to our knowledge, there is no quantitative documentation of the extent and timing of these changes.…
▽ More
Since the inception of the first web page three decades back, the Web has evolved considerably, from static HTML pages in the beginning to the dynamic web pages of today, from mainly the text-based pages of the 1990s to today's multimedia rich pages, etc. Although much of this is known anecdotally, to our knowledge, there is no quantitative documentation of the extent and timing of these changes. This paper attempts to address this gap in the literature by looking at the top 100 Alexa websites for over 25 years from the Internet Archive or the "Wayback Machine", archive.org. We study the changes in popularity, from Geocities and Yahoo! in the mid-to-late 1990s to the likes of Google, Facebook, and Tiktok of today. We also look at different categories of websites and their popularity over the years and find evidence for the decline in popularity of news and education-related websites, which have been replaced by streaming media and social networking sites. We explore the emergence and relative prevalence of different MIME-types (text vs. image vs. video vs. javascript and json) and study whether the use of text on the Internet is declining.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates
Authors:
Vibhor Agarwal,
Sagar Joglekar,
Anthony P. Young,
Nishanth Sastry
Abstract:
Online forums that allow participatory engagement between users have been transformative for public discussion of important issues. However, debates on such forums can sometimes escalate into full blown exchanges of hate or misinformation. An important tool in understanding and tackling such problems is to be able to infer the argumentative relation of whether a reply is supporting or attacking th…
▽ More
Online forums that allow participatory engagement between users have been transformative for public discussion of important issues. However, debates on such forums can sometimes escalate into full blown exchanges of hate or misinformation. An important tool in understanding and tackling such problems is to be able to infer the argumentative relation of whether a reply is supporting or attacking the post it is replying to. This so called polarity prediction task is difficult because replies may be based on external context beyond a post and the reply whose polarity is being predicted. We propose GraphNLI, a novel graph-based deep learning architecture that uses graph walk techniques to capture the wider context of a discussion thread in a principled fashion. Specifically, we propose methods to perform root-seeking graph walks that start from a post and captures its surrounding context to generate additional embeddings for the post. We then use these embeddings to predict the polarity relation between a reply and the post it is replying to. We evaluate the performance of our models on a curated debate dataset from Kialo, an online debating platform. Our model outperforms relevant baselines, including S-BERT, with an overall accuracy of 83%.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
PrivPAS: A real time Privacy-Preserving AI System and applied ethics
Authors:
Harichandana B S S,
Vibhav Agarwal,
Sourav Ghosh,
Gopi Ramena,
Sumit Kumar,
Barath Raj Kandur Raja
Abstract:
With 3.78 billion social media users worldwide in 2021 (48% of the human population), almost 3 billion images are shared daily. At the same time, a consistent evolution of smartphone cameras has led to a photography explosion with 85% of all new pictures being captured using smartphones. However, lately, there has been an increased discussion of privacy concerns when a person being photographed is…
▽ More
With 3.78 billion social media users worldwide in 2021 (48% of the human population), almost 3 billion images are shared daily. At the same time, a consistent evolution of smartphone cameras has led to a photography explosion with 85% of all new pictures being captured using smartphones. However, lately, there has been an increased discussion of privacy concerns when a person being photographed is unaware of the picture being taken or has reservations about the same being shared. These privacy violations are amplified for people with disabilities, who may find it challenging to raise dissent even if they are aware. Such unauthorized image captures may also be misused to gain sympathy by third-party organizations, leading to a privacy breach. Privacy for people with disabilities has so far received comparatively less attention from the AI community. This motivates us to work towards a solution to generate privacy-conscious cues for raising awareness in smartphone users of any sensitivity in their viewfinder content. To this end, we introduce PrivPAS (A real time Privacy-Preserving AI System) a novel framework to identify sensitive content. Additionally, we curate and annotate a dataset to identify and localize accessibility markers and classify whether an image is sensitive to a featured subject with a disability. We demonstrate that the proposed lightweight architecture, with a memory footprint of a mere 8.49MB, achieves a high mAP of 89.52% on resource-constrained devices. Furthermore, our pipeline, trained on face anonymized data, achieves an F1-score of 73.1%.
△ Less
Submitted 8 February, 2022; v1 submitted 5 February, 2022;
originally announced February 2022.
-
Object-based synthesis of scraping and rolling sounds based on non-linear physical constraints
Authors:
Vinayak Agarwal,
Maddie Cusimano,
James Traer,
Josh McDermott
Abstract:
Sustained contact interactions like scraping and rolling produce a wide variety of sounds. Previous studies have explored ways to synthesize these sounds efficiently and intuitively but could not fully mimic the rich structure of real instances of these sounds. We present a novel source-filter model for realistic synthesis of scraping and rolling sounds with physically and perceptually relevant co…
▽ More
Sustained contact interactions like scraping and rolling produce a wide variety of sounds. Previous studies have explored ways to synthesize these sounds efficiently and intuitively but could not fully mimic the rich structure of real instances of these sounds. We present a novel source-filter model for realistic synthesis of scraping and rolling sounds with physically and perceptually relevant controllable parameters constrained by principles of mechanics. Key features of our model include non-linearities to constrain the contact force, naturalistic normal force variation for different motions, and a method for morphing impulse responses within a material to achieve location-dependence. Perceptual experiments show that the presented model is able to synthesize realistic scraping and rolling sounds while conveying physical information similar to that in recorded sounds.
△ Less
Submitted 16 December, 2021;
originally announced December 2021.
-
Urine Microscopic Image Dataset
Authors:
Dipam Goswami,
Hari Om Aggrawal,
Rajiv Gupta,
Vinti Agarwal
Abstract:
Urinalysis is a standard diagnostic test to detect urinary system related problems. The automation of urinalysis will reduce the overall diagnostic time. Recent studies used urine microscopic datasets for designing deep learning based algorithms to classify and detect urine cells. But these datasets are not publicly available for further research. To alleviate the need for urine datsets, we prepar…
▽ More
Urinalysis is a standard diagnostic test to detect urinary system related problems. The automation of urinalysis will reduce the overall diagnostic time. Recent studies used urine microscopic datasets for designing deep learning based algorithms to classify and detect urine cells. But these datasets are not publicly available for further research. To alleviate the need for urine datsets, we prepare our urine sediment microscopic image (UMID) dataset comprising of around 3700 cell annotations and 3 categories of cells namely RBC, pus and epithelial cells. We discuss the several challenges involved in preparing the dataset and the annotations. We make the dataset publicly available.
△ Less
Submitted 19 November, 2021;
originally announced November 2021.
-
A Frequency Perspective of Adversarial Robustness
Authors:
Shishira R Maiya,
Max Ehrlich,
Vatsal Agarwal,
Ser-Nam Lim,
Tom Goldstein,
Abhinav Shrivastava
Abstract:
Adversarial examples pose a unique challenge for deep learning systems. Despite recent advances in both attacks and defenses, there is still a lack of clarity and consensus in the community about the true nature and underlying properties of adversarial examples. A deep understanding of these examples can provide new insights towards the development of more effective attacks and defenses. Driven by…
▽ More
Adversarial examples pose a unique challenge for deep learning systems. Despite recent advances in both attacks and defenses, there is still a lack of clarity and consensus in the community about the true nature and underlying properties of adversarial examples. A deep understanding of these examples can provide new insights towards the development of more effective attacks and defenses. Driven by the common misconception that adversarial examples are high-frequency noise, we present a frequency-based understanding of adversarial examples, supported by theoretical and empirical findings. Our analysis shows that adversarial examples are neither in high-frequency nor in low-frequency components, but are simply dataset dependent. Particularly, we highlight the glaring disparities between models trained on CIFAR-10 and ImageNet-derived datasets. Utilizing this framework, we analyze many intriguing properties of training robust models with frequency constraints, and propose a frequency-based explanation for the commonly observed accuracy vs. robustness trade-off.
△ Less
Submitted 26 October, 2021;
originally announced November 2021.
-
LIDSNet: A Lightweight on-device Intent Detection model using Deep Siamese Network
Authors:
Vibhav Agarwal,
Sudeep Deepak Shivnikar,
Sourav Ghosh,
Himanshu Arora,
Yashwant Saini
Abstract:
Intent detection is a crucial task in any Natural Language Understanding (NLU) system and forms the foundation of a task-oriented dialogue system. To build high-quality real-world conversational solutions for edge devices, there is a need for deploying intent detection model on device. This necessitates a light-weight, fast, and accurate model that can perform efficiently in a resource-constrained…
▽ More
Intent detection is a crucial task in any Natural Language Understanding (NLU) system and forms the foundation of a task-oriented dialogue system. To build high-quality real-world conversational solutions for edge devices, there is a need for deploying intent detection model on device. This necessitates a light-weight, fast, and accurate model that can perform efficiently in a resource-constrained environment. To this end, we propose LIDSNet, a novel lightweight on-device intent detection model, which accurately predicts the message intent by utilizing a Deep Siamese Network for learning better sentence representations. We use character-level features to enrich the sentence-level representations and empirically demonstrate the advantage of transfer learning by utilizing pre-trained embeddings. Furthermore, to investigate the efficacy of the modules in our architecture, we conduct an ablation study and arrive at our optimal model. Experimental results prove that LIDSNet achieves state-of-the-art competitive accuracy of 98.00% and 95.97% on SNIPS and ATIS public datasets respectively, with under 0.59M parameters. We further benchmark LIDSNet against fine-tuned BERTs and show that our model is at least 41x lighter and 30x faster during inference than MobileBERT on Samsung Galaxy S20 device, justifying its efficiency on resource-constrained edge devices.
△ Less
Submitted 6 October, 2021;
originally announced October 2021.
-
Invariant Language Modeling
Authors:
Maxime Peyrard,
Sarvjeet Singh Ghotra,
Martin Josifoski,
Vidhan Agarwal,
Barun Patra,
Dean Carignan,
Emre Kiciman,
Robert West
Abstract:
Large pretrained language models are critical components of modern NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize b…
▽ More
Large pretrained language models are critical components of modern NLP pipelines. Yet, they suffer from spurious correlations, poor out-of-domain generalization, and biases. Inspired by recent progress in causal machine learning, in particular the invariant risk minimization (IRM) paradigm, we propose invariant language modeling, a framework for learning invariant representations that generalize better across multiple environments. In particular, we adapt a game-theoretic formulation of IRM (IRM-games) to language models, where the invariance emerges from a specific training schedule in which all the environments compete to optimize their own environment-specific loss by updating subsets of the model in a round-robin fashion. We focus on controlled experiments to precisely demonstrate the ability of our method to (i) remove structured noise, (ii) ignore specific spurious correlations without affecting global performance, and (iii) achieve better out-of-domain generalization. These benefits come with a negligible computational overhead compared to standard training, do not require changing the local loss, and can be applied to any language model. We believe this framework is promising to help mitigate spurious correlations and biases in language models.
△ Less
Submitted 14 November, 2022; v1 submitted 15 October, 2021;
originally announced October 2021.
-
A novel approach for modelling and classifying sit-to-stand kinematics using inertial sensors
Authors:
Maitreyee Wairagkar,
Emma Villeneuve,
Rachel King,
Balazs Janko,
Malcolm Burnett,
Ann Ashburn,
Veena Agarwal,
R. Simon Sherratt,
William Holderbaum,
William Harwin
Abstract:
Sit-to-stand transitions are an important part of activities of daily living and play a key role in functional mobility in humans. The sit-to-stand movement is often affected in older adults due to frailty and in patients with motor impairments such as Parkinson's disease leading to falls. Studying kinematics of sit-to-stand transitions can provide insight in assessment, monitoring and developing…
▽ More
Sit-to-stand transitions are an important part of activities of daily living and play a key role in functional mobility in humans. The sit-to-stand movement is often affected in older adults due to frailty and in patients with motor impairments such as Parkinson's disease leading to falls. Studying kinematics of sit-to-stand transitions can provide insight in assessment, monitoring and developing rehabilitation strategies for the affected populations. We propose a three-segment body model for estimating sit-to-stand kinematics using only two wearable inertial sensors, placed on the shank and back. Reducing the number of sensors to two instead of one per body segment facilitates monitoring and classifying movements over extended periods, making it more comfortable to wear while reducing the power requirements of sensors. We applied this model on 10 younger healthy adults (YH), 12 older healthy adults (OH) and 12 people with Parkinson's disease (PwP). We have achieved this by incorporating unique sit-to-stand classification technique using unsupervised learning in the model based reconstruction of angular kinematics using extended Kalman filter. Our proposed model showed that it was possible to successfully estimate thigh kinematics despite not measuring the thigh motion with inertial sensor. We classified sit-to-stand transitions, sitting and standing states with the accuracies of 98.67%, 94.20% and 91.41% for YH, OH and PwP respectively. We have proposed a novel integrated approach of modelling and classification for estimating the body kinematics during sit-to-stand motion and successfully applied it on YH, OH and PwP groups.
△ Less
Submitted 14 July, 2021;
originally announced July 2021.
-
Analyzing Images for Music Recommendation
Authors:
Anant Baijal,
Vivek Agarwal,
Danny Hyun
Abstract:
Experiencing images with suitable music can greatly enrich the overall user experience. The proposed image analysis method treats an artwork image differently from a photograph image. Automatic image classification is performed using deep-learning based models. An illustrative analysis showcasing the ability of our deep-models to inherently learn and utilize perceptually relevant features when cla…
▽ More
Experiencing images with suitable music can greatly enrich the overall user experience. The proposed image analysis method treats an artwork image differently from a photograph image. Automatic image classification is performed using deep-learning based models. An illustrative analysis showcasing the ability of our deep-models to inherently learn and utilize perceptually relevant features when classifying artworks is also presented. The Mean Opinion Score (MOS) obtained from subjective assessments of the respective image and recommended music pairs supports the effectiveness of our approach.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
Analyzing the Nuances of Transformers' Polynomial Simplification Abilities
Authors:
Vishesh Agarwal,
Somak Aditya,
Navin Goyal
Abstract:
Symbolic Mathematical tasks such as integration often require multiple well-defined steps and understanding of sub-tasks to reach a solution. To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task. Polynomials can be written in a simple normal form as a sum of monomials wh…
▽ More
Symbolic Mathematical tasks such as integration often require multiple well-defined steps and understanding of sub-tasks to reach a solution. To understand Transformers' abilities in such tasks in a fine-grained manner, we deviate from traditional end-to-end settings, and explore a step-wise polynomial simplification task. Polynomials can be written in a simple normal form as a sum of monomials which are ordered in a lexicographic order. For a polynomial which is not necessarily in this normal form, a sequence of simplification steps is applied to reach the fully simplified (i.e., in the normal form) polynomial. We propose a synthetic Polynomial dataset generation algorithm that generates polynomials with unique proof steps. Through varying coefficient configurations, input representation, proof granularity, and extensive hyper-parameter tuning, we observe that Transformers consistently struggle with numeric multiplication. We explore two ways to mitigate this: Curriculum Learning and a Symbolic Calculator approach (where the numeric operations are offloaded to a calculator). Both approaches provide significant gains over the vanilla Transformers-based baseline.
△ Less
Submitted 28 April, 2021;
originally announced April 2021.
-
FONTNET: On-Device Font Understanding and Prediction Pipeline
Authors:
Rakshith S,
Rishabh Khurana,
Vibhav Agarwal,
Jayesh Rajkumar Vachhani,
Guggilla Bhanodai
Abstract:
Fonts are one of the most basic and core design concepts. Numerous use cases can benefit from an in depth understanding of Fonts such as Text Customization which can change text in an image while maintaining the Font attributes like style, color, size. Currently, Text recognition solutions can group recognized text based on line breaks or paragraph breaks, if the Font attributes are known multiple…
▽ More
Fonts are one of the most basic and core design concepts. Numerous use cases can benefit from an in depth understanding of Fonts such as Text Customization which can change text in an image while maintaining the Font attributes like style, color, size. Currently, Text recognition solutions can group recognized text based on line breaks or paragraph breaks, if the Font attributes are known multiple text blocks can be combined based on context in a meaningful manner. In this paper, we propose two engines: Font Detection Engine, which identifies the font style, color and size attributes of text in an image and a Font Prediction Engine, which predicts similar fonts for a query font. Major contributions of this paper are three-fold: First, we developed a novel CNN architecture for identifying font style of text in images. Second, we designed a novel algorithm for predicting similar fonts for a given query font. Third, we have optimized and deployed the entire engine On-Device which ensures privacy and improves latency in real time applications such as instant messaging. We achieve a worst case On-Device inference time of 30ms and a model size of 4.5MB for both the engines.
△ Less
Submitted 30 March, 2021;
originally announced March 2021.
-
Addressing catastrophic forgetting for medical domain expansion
Authors:
Sharut Gupta,
Praveer Singh,
Ken Chang,
Liangqiong Qu,
Mehak Aggarwal,
Nishanth Arun,
Ashwin Vaswani,
Shruti Raghavan,
Vibha Agarwal,
Mishka Gidwani,
Katharina Hoebel,
Jay Patel,
Charles Lu,
Christopher P. Bridge,
Daniel L. Rubin,
Jayashree Kalpathy-Cramer
Abstract:
Model brittleness is a key concern when deploying deep learning models in real-world medical settings. A model that has high performance at one institution may suffer a significant decline in performance when tested at other institutions. While pooling datasets from multiple institutions and retraining may provide a straightforward solution, it is often infeasible and may compromise patient privac…
▽ More
Model brittleness is a key concern when deploying deep learning models in real-world medical settings. A model that has high performance at one institution may suffer a significant decline in performance when tested at other institutions. While pooling datasets from multiple institutions and retraining may provide a straightforward solution, it is often infeasible and may compromise patient privacy. An alternative approach is to fine-tune the model on subsequent institutions after training on the original institution. Notably, this approach degrades model performance at the original institution, a phenomenon known as catastrophic forgetting. In this paper, we develop an approach to address catastrophic forget-ting based on elastic weight consolidation combined with modulation of batch normalization statistics under two scenarios: first, for expanding the domain from one imaging system's data to another imaging system's, and second, for expanding the domain from a large multi-institutional dataset to another single institution dataset. We show that our approach outperforms several other state-of-the-art approaches and provide theoretical justification for the efficacy of batch normalization modulation. The results of this study are generally applicable to the deployment of any clinical deep learning model which requires domain expansion.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.
-
Optimization of wide-band quasi-omnidirectional 1-D photonic structures
Authors:
Victor Castillo-Gallardo,
Luis Eduardo Puente-Díaz,
David Ariza-Flores,
Héctor Pérez-Aguilar,
W. Luis Mochán,
Vivechana Agarwal
Abstract:
We have designed, optimized, fabricated and characterized highly reflective quasi-omnidirectional (angular range of $0-60^\circ$) multilayered structures with a wide spectral range. Two techniques, chirping (a continuous change in thicknesses) and stacking of Bragg-type sub-structures, have been used to enhance the reflectance with minimum thickness for a given pair of refractive indices. Numerica…
▽ More
We have designed, optimized, fabricated and characterized highly reflective quasi-omnidirectional (angular range of $0-60^\circ$) multilayered structures with a wide spectral range. Two techniques, chirping (a continuous change in thicknesses) and stacking of Bragg-type sub-structures, have been used to enhance the reflectance with minimum thickness for a given pair of refractive indices. Numerical calculations were carried out employing the transfer matrix method and we optimized the design parameters to obtain maximal reflectance averaged over different spectral ranges for all angles. We fabricated some of the optimized structures with porous silicon dielectric multilayers with low refractive contrast and compared their measured optical properties with the calculations. Two chirped structures with thicknesses 21.6 $μ$m and 60.4 $μ$m, resulting in quasi omnidirectional mirrors with bandwidths of 360 nm and 1800 nm, centered at 1160 nm and 1925 nm respectively have been shown. In addition, we fabricated a stacked sub-Bragg mirror structure with a quasi omnidirectional bandwidth of 1800 nm (centered at 1850 nm) and a thickness of 41.5 $μ$m, which is almost two third (in thickness) of the chirped structure. Thus, our techniques allowed us to obtain relatively thin quasi-omnidirectional mirrors with wide bands over different wavelength ranges. Our analysis techniques can be used for the optimization of the reflectance not only in multilayered PS systems with different refractive index contrasts but also for systems with other types of materials with low refractive index contrast. The present study could also be useful for obtaining omnidirectional dielectric mirrors in large spectral regions using different materials, flat focusing reflectors, thermal regulators, or, if defects are included, as filters or remote chemical/biosensors with a wide angular independent response.
△ Less
Submitted 23 March, 2021; v1 submitted 9 March, 2021;
originally announced March 2021.
-
Differential Tracking Across Topical Webpages of Indian News Media
Authors:
Yash Vekaria,
Vibhor Agarwal,
Pushkal Agarwal,
Sangeeta Mahapatra,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties emplo…
▽ More
Online user privacy and tracking have been extensively studied in recent years, especially due to privacy and personal data-related legislations in the EU and the USA, such as the General Data Protection Regulation, ePrivacy Regulation, and California Consumer Privacy Act. Research has revealed novel tracking and personal identifiable information leakage methods that first- and third-parties employ on websites around the world, as well as the intensity of tracking performed on such websites. However, for the sake of scaling to cover a large portion of the Web, most past studies focused on homepages of websites, and did not look deeper into the tracking practices on their topical subpages. The majority of studies focused on the Global North markets such as the EU and the USA. Large markets such as India, which covers 20% of the world population and has no explicit privacy laws, have not been studied in this regard.
We aim to address these gaps and focus on the following research questions: Is tracking on topical subpages of Indian news websites different from their homepage? Do third-party trackers prefer to track specific topics? How does this preference compare to the similarity of content shown on these topical subpages? To answer these questions, we propose a novel method for automatic extraction and categorization of Indian news topical subpages based on the details in their URLs. We study the identified topical subpages and compare them with their homepages with respect to the intensity of cookie injection and third-party embeddedness and type. We find differential user tracking among subpages, and between subpages and homepages. We also find a preferential attachment of third-party trackers to specific topics. Also, embedded third-parties tend to track specific subpages simultaneously, revealing possible user profiling in action.
△ Less
Submitted 7 March, 2021;
originally announced March 2021.
-
Under the Spotlight: Web Tracking in Indian Partisan News Websites
Authors:
Vibhor Agarwal,
Yash Vekaria,
Pushkal Agarwal,
Sangeeta Mahapatra,
Shounak Set,
Sakthi Balan Muthiah,
Nishanth Sastry,
Nicolas Kourtellis
Abstract:
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters wi…
▽ More
India is experiencing intense political partisanship and sectarian divisions. The paper performs, to the best of our knowledge, the first comprehensive analysis on the Indian online news media with respect to tracking and partisanship. We build a dataset of 103 online, mostly mainstream news websites. With the help of two experts, alongside data from the Media Ownership Monitor of the Reporters without Borders, we label these websites according to their partisanship (Left, Right, or Centre). We study and compare user tracking on these sites with different metrics: numbers of cookies, cookie synchronizations, device fingerprinting, and invisible pixel-based tracking. We find that Left and Centre websites serve more cookies than Right-leaning websites. However, through cookie synchronization, more user IDs are synchronized in Left websites than Right or Centre. Canvas fingerprinting is used similarly by Left and Right, and less by Centre. Invisible pixel-based tracking is 50% more intense in Centre-leaning websites than Right, and 25% more than Left. Desktop versions of news websites deliver more cookies than their mobile counterparts. A handful of third-parties are tracking users in most websites in this study. This paper, by demonstrating intense web tracking, has implications for research on overall privacy of users visiting partisan news websites in India.
△ Less
Submitted 8 March, 2021; v1 submitted 6 February, 2021;
originally announced February 2021.
-
Analytical Model for the Current Density in the Electrochemical Synthesis of Porous Silicon Structures with a Lateral Gradient
Authors:
C. A. Ospina-Delacruz,
V. Agarwal,
W. L. Mochán
Abstract:
Layered optical devices with a lateral gradient can be fabricated through electrochemical synthesis of porous silicon (PS) using a position dependent etching current density $\bm j(\bm r_\|)$. Predicting the local value of $\bm j(\bm r_\|)$ and the corresponding porosity $p(\bm r_\|)$ and etching rate $v(\bm r_\|)$ is desirable for their systematic design. We develop a simple analytical model for…
▽ More
Layered optical devices with a lateral gradient can be fabricated through electrochemical synthesis of porous silicon (PS) using a position dependent etching current density $\bm j(\bm r_\|)$. Predicting the local value of $\bm j(\bm r_\|)$ and the corresponding porosity $p(\bm r_\|)$ and etching rate $v(\bm r_\|)$ is desirable for their systematic design. We develop a simple analytical model for the calculation of $\bm j(\bm r_\|)$ within a prism shaped cell. Graded single layer PS samples were synthesized and their local calibration curves $p$ vs $\bm j$ and $v$ vs $\bm j$ were obtained from our model and their reflectance spectra. The agreement found between the calibration curves from different samples shows that from one sample we could obtain full calibration curves which may be used to predict, design, and fabricate more complex non-homogeneous multilayered devices with lateral gradients for manifold applications.
△ Less
Submitted 22 January, 2021;
originally announced January 2021.
-
EmpLite: A Lightweight Sequence Labeling Model for Emphasis Selection of Short Texts
Authors:
Vibhav Agarwal,
Sourav Ghosh,
Kranti Chalamalasetti,
Bharath Challa,
Sonal Kumari,
Harshavardhana,
Barath Raj Kandur Raja
Abstract:
Word emphasis in textual content aims at conveying the desired intention by changing the size, color, typeface, style (bold, italic, etc.), and other typographical features. The emphasized words are extremely helpful in drawing the readers' attention to specific information that the authors wish to emphasize. However, performing such emphasis using a soft keyboard for social media interactions is…
▽ More
Word emphasis in textual content aims at conveying the desired intention by changing the size, color, typeface, style (bold, italic, etc.), and other typographical features. The emphasized words are extremely helpful in drawing the readers' attention to specific information that the authors wish to emphasize. However, performing such emphasis using a soft keyboard for social media interactions is time-consuming and has an associated learning curve. In this paper, we propose a novel approach to automate the emphasis word detection on short written texts. To the best of our knowledge, this work presents the first lightweight deep learning approach for smartphone deployment of emphasis selection. Experimental results show that our approach achieves comparable accuracy at a much lower model size than existing models. Our best lightweight model has a memory footprint of 2.82 MB with a matching score of 0.716 on SemEval-2020 public benchmark dataset.
△ Less
Submitted 15 December, 2020;
originally announced January 2021.
-
LiteMuL: A Lightweight On-Device Sequence Tagger using Multi-task Learning
Authors:
Sonal Kumari,
Vibhav Agarwal,
Bharath Challa,
Kranti Chalamalasetti,
Sourav Ghosh,
Harshavardhana,
Barath Raj Kandur Raja
Abstract:
Named entity detection and Parts-of-speech tagging are the key tasks for many NLP applications. Although the current state of the art methods achieved near perfection for long, formal, structured text there are hindrances in deploying these models on memory-constrained devices such as mobile phones. Furthermore, the performance of these models is degraded when they encounter short, informal, and c…
▽ More
Named entity detection and Parts-of-speech tagging are the key tasks for many NLP applications. Although the current state of the art methods achieved near perfection for long, formal, structured text there are hindrances in deploying these models on memory-constrained devices such as mobile phones. Furthermore, the performance of these models is degraded when they encounter short, informal, and casual conversations. To overcome these difficulties, we present LiteMuL - a lightweight on-device sequence tagger that can efficiently process the user conversations using a Multi-Task Learning (MTL) approach. To the best of our knowledge, the proposed model is the first on-device MTL neural model for sequence tagging. Our LiteMuL model is about 2.39 MB in size and achieved an accuracy of 0.9433 (for NER), 0.9090 (for POS) on the CoNLL 2003 dataset. The proposed LiteMuL not only outperforms the current state of the art results but also surpasses the results of our proposed on-device task-specific models, with accuracy gains of up to 11% and model-size reduction by 50%-56%. Our model is competitive with other MTL approaches for NER and POS tasks while outshines them with a low memory footprint. We also evaluated our model on custom-curated user conversations and observed impressive results.
△ Less
Submitted 29 March, 2021; v1 submitted 15 December, 2020;
originally announced January 2021.
-
Audrey: A Personalized Open-Domain Conversational Bot
Authors:
Chung Hoon Hong,
Yuan Liang,
Sagnik Sinha Roy,
Arushi Jain,
Vihang Agarwal,
Ryan Draves,
Zhizhuo Zhou,
William Chen,
Yujian Liu,
Martha Miracky,
Lily Ge,
Nikola Banovic,
David Jurgens
Abstract:
Conversational Intelligence requires that a person engage on informational, personal and relational levels. Advances in Natural Language Understanding have helped recent chatbots succeed at dialog on the informational level. However, current techniques still lag for conversing with humans on a personal level and fully relating to them. The University of Michigan's submission to the Alexa Prize Gra…
▽ More
Conversational Intelligence requires that a person engage on informational, personal and relational levels. Advances in Natural Language Understanding have helped recent chatbots succeed at dialog on the informational level. However, current techniques still lag for conversing with humans on a personal level and fully relating to them. The University of Michigan's submission to the Alexa Prize Grand Challenge 3, Audrey, is an open-domain conversational chat-bot that aims to engage customers on these levels through interest driven conversations guided by customers' personalities and emotions. Audrey is built from socially-aware models such as Emotion Detection and a Personal Understanding Module to grasp a deeper understanding of users' interests and desires. Our architecture interacts with customers using a hybrid approach balanced between knowledge-driven response generators and context-driven neural response generators to cater to all three levels of conversations. During the semi-finals period, we achieved an average cumulative rating of 3.25 on a 1-5 Likert scale.
△ Less
Submitted 11 November, 2020;
originally announced November 2020.
-
Extracting Procedural Knowledge from Technical Documents
Authors:
Shivali Agarwal,
Shubham Atreja,
Vikas Agarwal
Abstract:
Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation. It is a challenging problem to parse big dense documents like product manuals, user guides to automatically understand which parts are talking about procedures and subsequently extract them. Most of the existing research has focuse…
▽ More
Procedures are an important knowledge component of documents that can be leveraged by cognitive assistants for automation, question-answering or driving a conversation. It is a challenging problem to parse big dense documents like product manuals, user guides to automatically understand which parts are talking about procedures and subsequently extract them. Most of the existing research has focused on extracting flows in given procedures or understanding the procedures in order to answer conceptual questions. Identifying and extracting multiple procedures automatically from documents of diverse formats remains a relatively less addressed problem. In this work, we cover some of this ground by -- 1) Providing insights on how structural and linguistic properties of documents can be grouped to define types of procedures, 2) Analyzing documents to extract the relevant linguistic and structural properties, and 3) Formulating procedure identification as a classification problem that leverages the features of the document derived from the above analysis. We first implemented and deployed unsupervised techniques which were used in different use cases. Based on the evaluation in different use cases, we figured out the weaknesses of the unsupervised approach. We then designed an improved version which was supervised. We demonstrate that our technique is effective in identifying procedures from big and complex documents alike by achieving accuracy of 89%.
△ Less
Submitted 20 October, 2020;
originally announced October 2020.
-
Stable Calculation of Optical Properties of Large Non-Periodic Dissipative Multilayered Systems
Authors:
Luis Eduardo Puente-Díaz,
Victor Castillo-Gallardo,
Guillermo P. Ortiz,
José Samuel Pérez-Huerta,
Héctor Pérez-Aguilar,
Vivechana Agarwal,
W. Luis Mochán
Abstract:
The calculation of the transfer matrix for a large non-periodic multilayered system may become unstable in the presence of absorption. We discuss the origin of this instability and we explore two methods to overcome it: the use of a total matrix to solve for all the fields at all the interfaces simultaneously and an expansion in the Bloch-like modes of a periodic artificially repeated system. We a…
▽ More
The calculation of the transfer matrix for a large non-periodic multilayered system may become unstable in the presence of absorption. We discuss the origin of this instability and we explore two methods to overcome it: the use of a total matrix to solve for all the fields at all the interfaces simultaneously and an expansion in the Bloch-like modes of a periodic artificially repeated system. We apply both methods to obtain the reflectance spectra of multilayered chirped structures composed of nanostructured porous silicon (PS). Both methods yield reliable and numerically stable results. The former allows an analysis of the field within all layers while the latter is much more efficient computationally, allowing the design of novel structures and the optimization of their parameters. We compare numerical and experimental results across a wide spectral range from the infrared to the ultraviolet.
△ Less
Submitted 2 May, 2020;
originally announced May 2020.
-
Computer Aided Detection for Pulmonary Embolism Challenge (CAD-PE)
Authors:
Germán González,
Daniel Jimenez-Carretero,
Sara Rodríguez-López,
Carlos Cano-Espinosa,
Miguel Cazorla,
Tanya Agarwal,
Vinit Agarwal,
Nima Tajbakhsh,
Michael B. Gotway,
Jianming Liang,
Mojtaba Masoudi,
Noushin Eftekhari,
Mahdi Saadatmand,
Hamid-Reza Pourreza,
Patricia Fraga-Rivas,
Eduardo Fraile,
Frank J. Rybicki,
Ara Kassarjian,
Raúl San José Estépar,
Maria J. Ledesma-Carbayo
Abstract:
Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists' sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography…
▽ More
Rationale: Computer aided detection (CAD) algorithms for Pulmonary Embolism (PE) algorithms have been shown to increase radiologists' sensitivity with a small increase in specificity. However, CAD for PE has not been adopted into clinical practice, likely because of the high number of false positives current CAD software produces. Objective: To generate a database of annotated computed tomography pulmonary angiographies, use it to compare the sensitivity and false positive rate of current algorithms and to develop new methods that improve such metrics. Methods: 91 Computed tomography pulmonary angiography scans were annotated by at least one radiologist by segmenting all pulmonary emboli visible on the study. 20 annotated CTPAs were open to the public in the form of a medical image analysis challenge. 20 more were kept for evaluation purposes. 51 were made available post-challenge. 8 submissions, 6 of them novel, were evaluated on the 20 evaluation CTPAs. Performance was measured as per embolus sensitivity vs. false positives per scan curve. Results: The best algorithms achieved a per-embolus sensitivity of 75% at 2 false positives per scan (fps) or of 70% at 1 fps, outperforming the state of the art. Deep learning approaches outperformed traditional machine learning ones, and their performance improved with the number of training cases. Significance: Through this work and challenge we have improved the state-of-the art of computer aided detection algorithms for pulmonary embolism. An open database and an evaluation benchmark for such algorithms have been generated, easing the development of further improvements. Implications on clinical practice will need further research.
△ Less
Submitted 30 March, 2020;
originally announced March 2020.