-
A Formal Analysis of Algorithms for Matroids and Greedoids
Authors:
Mohammad Abdulaziz,
Thomas Ammer,
Shriya Meenakshisundaram,
Adem Rimpapa
Abstract:
We present a formal analysis, in Isabelle/HOL, of optimisation algorithms for matroids, which are useful generalisations of combinatorial structures that occur in optimisation, and greedoids, which are a generalisation of matroids. Although some formalisation work has been done earlier on matroids, our work here presents the first formalisation of results on greedoids, and many results we formalis…
▽ More
We present a formal analysis, in Isabelle/HOL, of optimisation algorithms for matroids, which are useful generalisations of combinatorial structures that occur in optimisation, and greedoids, which are a generalisation of matroids. Although some formalisation work has been done earlier on matroids, our work here presents the first formalisation of results on greedoids, and many results we formalise in relation to matroids are also formalised for the first time in this work. We formalise the analysis of a number of optimisation algorithms for matroids and greedoids. We also derive from those algorithms executable implementations of Kruskal's algorithm for minimum spanning trees, an algorithm for maximum cardinality matching for bi-partite graphs, and Prim's algorithm for computing minimum weight spanning trees.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Disentangling Reasoning and Knowledge in Medical Large Language Models
Authors:
Rahul Thapa,
Qingyang Wu,
Kevin Wu,
Harrison Zhang,
Angela Zhang,
Eric Wu,
Haotian Ye,
Suhana Bedi,
Nevin Aresh,
Joseph Boen,
Shriya Reddy,
Ben Athiwaratkun,
Shuaiwen Leon Song,
James Zou
Abstract:
Medical reasoning in large language models (LLMs) aims to emulate clinicians' diagnostic thinking, but current benchmarks such as MedQA-USMLE, MedMCQA, and PubMedQA often mix reasoning with factual recall. We address this by separating 11 biomedical QA benchmarks into reasoning- and knowledge-focused subsets using a PubMedBERT classifier that reaches 81 percent accuracy, comparable to human perfor…
▽ More
Medical reasoning in large language models (LLMs) aims to emulate clinicians' diagnostic thinking, but current benchmarks such as MedQA-USMLE, MedMCQA, and PubMedQA often mix reasoning with factual recall. We address this by separating 11 biomedical QA benchmarks into reasoning- and knowledge-focused subsets using a PubMedBERT classifier that reaches 81 percent accuracy, comparable to human performance. Our analysis shows that only 32.8 percent of questions require complex reasoning. We evaluate biomedical models (HuatuoGPT-o1, MedReason, m1) and general-domain models (DeepSeek-R1, o4-mini, Qwen3), finding consistent gaps between knowledge and reasoning performance. For example, m1 scores 60.5 on knowledge but only 47.1 on reasoning. In adversarial tests where models are misled with incorrect initial reasoning, biomedical models degrade sharply, while larger or RL-trained general models show more robustness. To address this, we train BioMed-R1 using fine-tuning and reinforcement learning on reasoning-heavy examples. It achieves the strongest performance among similarly sized models. Further gains may come from incorporating clinical case reports and training with adversarial and backtracking scenarios.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Multi-Objective Causal Bayesian Optimization
Authors:
Shriya Bhatija,
Paul-David Zuercher,
Jakob Thumm,
Thomas Bohné
Abstract:
In decision-making problems, the outcome of an intervention often depends on the causal relationships between system components and is highly costly to evaluate. In such settings, causal Bayesian optimization (CBO) can exploit the causal relationships between the system variables and sequentially perform interventions to approach the optimum with minimal data. Extending CBO to the multi-outcome se…
▽ More
In decision-making problems, the outcome of an intervention often depends on the causal relationships between system components and is highly costly to evaluate. In such settings, causal Bayesian optimization (CBO) can exploit the causal relationships between the system variables and sequentially perform interventions to approach the optimum with minimal data. Extending CBO to the multi-outcome setting, we propose Multi-Objective Causal Bayesian Optimization (MO-CBO), a paradigm for identifying Pareto-optimal interventions within a known multi-target causal graph. We first derive a graphical characterization for potentially optimal sets of variables to intervene upon. Showing that any MO-CBO problem can be decomposed into several traditional multi-objective optimization tasks, we then introduce an algorithm that sequentially balances exploration across these tasks using relative hypervolume improvement. The proposed method will be validated on both synthetic and real-world causal graphs, demonstrating its superiority over traditional (non-causal) multi-objective Bayesian optimization in settings where causal information is available.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Beyond the Lens: Quantifying the Impact of Scientific Documentaries through Amazon Reviews
Authors:
Jill Naiman,
Aria Pessianzadeh,
Hanyu Zhao,
AJ Christensen,
Kalina Borkiewicz,
Shriya Srikanth,
Anushka Gami,
Emma Maxwell,
Louisa Zhang,
Sri Nithya Yeragorla,
Rezvaneh Rezapour
Abstract:
Engaging the public with science is critical for a well-informed population. A popular method of scientific communication is documentaries. Once released, it can be difficult to assess the impact of such works on a large scale, due to the overhead required for in-depth audience feedback studies. In what follows, we overview our complementary approach to qualitative studies through quantitative imp…
▽ More
Engaging the public with science is critical for a well-informed population. A popular method of scientific communication is documentaries. Once released, it can be difficult to assess the impact of such works on a large scale, due to the overhead required for in-depth audience feedback studies. In what follows, we overview our complementary approach to qualitative studies through quantitative impact and sentiment analysis of Amazon reviews for several scientific documentaries. In addition to developing a novel impact category taxonomy for this analysis, we release a dataset containing 1296 human-annotated sentences from 1043 Amazon reviews for six movies created in whole or part by the Advanced Visualization Lab (AVL). This interdisciplinary team is housed at the National Center for Supercomputing Applications and consists of visualization designers who focus on cinematic presentations of scientific data. Using this data, we train and evaluate several machine learning and large language models, discussing their effectiveness and possible generalizability for documentaries beyond those focused on for this work. Themes are also extracted from our annotated dataset which, along with our large language model analysis, demonstrate a measure of the ability of scientific documentaries to engage with the public.
△ Less
Submitted 4 March, 2025; v1 submitted 12 February, 2025;
originally announced February 2025.
-
Pre-Ictal Seizure Prediction Using Personalized Deep Learning
Authors:
Shriya Jaddu,
Sidh Jaddu,
Camilo Gutierrez,
Quincy K. Tran
Abstract:
Introduction: Approximately 23 million or 30% of epilepsy patients worldwide suffer from drug-resistant epilepsy (DRE). The unpredictability of seizure occurrences, which causes safety issues as well as social concerns, restrict the lifestyles of DRE patients. Surgical solutions and EEG-based solutions are very expensive, unreliable, invasive or impractical. The goal of this research was to employ…
▽ More
Introduction: Approximately 23 million or 30% of epilepsy patients worldwide suffer from drug-resistant epilepsy (DRE). The unpredictability of seizure occurrences, which causes safety issues as well as social concerns, restrict the lifestyles of DRE patients. Surgical solutions and EEG-based solutions are very expensive, unreliable, invasive or impractical. The goal of this research was to employ improved technologies and methods to epilepsy patient physiological data and predict seizures up to two hours before onset, enabling non-invasive, affordable seizure prediction for DRE patients.
Methods: This research used a 1D Convolutional Neural Network-Based Bidirectional Long Short-Term Memory network that was trained on a diverse set of epileptic patient physiological data to predict seizures. Transfer learning was further utilized to personalize and optimize predictions for specific patients. Clinical data was retrospectively obtained for nine epilepsy patients via wearable devices over a period of about three to five days from a prospectively maintained database. The physiological data included 54 seizure occurrences and included heart rate, blood volume pulse, accelerometry, body temperature, and electrodermal activity.
Results and Conclusion: A general deep-learning model trained on the physiological data with randomly sampled test data achieved an accuracy of 91.94%. However, such a generalized deep learning model had varied performances on data from unseen patients. When the general model was personalized (further trained) with patient-specific data, the personalized model achieved significantly improved performance with accuracies as high as 97%. This preliminary research shows that patient-specific personalization may be a viable approach to achieve affordable, non-invasive seizure prediction that can improve the quality of life for DRE patients.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Multilingual Retrieval Augmented Generation for Culturally-Sensitive Tasks: A Benchmark for Cross-lingual Robustness
Authors:
Bryan Li,
Fiona Luo,
Samar Haider,
Adwait Agashe,
Tammy Li,
Runqi Liu,
Muqing Miao,
Shriya Ramakrishnan,
Yuan Yuan,
Chris Callison-Burch
Abstract:
The paradigm of retrieval-augmented generated (RAG) helps mitigate hallucinations of large language models (LLMs). However, RAG also introduces biases contained within the retrieved documents. These biases can be amplified in scenarios which are multilingual and culturally-sensitive, such as territorial disputes. In this paper, we introduce BordIRLines, a benchmark consisting of 720 territorial di…
▽ More
The paradigm of retrieval-augmented generated (RAG) helps mitigate hallucinations of large language models (LLMs). However, RAG also introduces biases contained within the retrieved documents. These biases can be amplified in scenarios which are multilingual and culturally-sensitive, such as territorial disputes. In this paper, we introduce BordIRLines, a benchmark consisting of 720 territorial dispute queries paired with 14k Wikipedia documents across 49 languages. To evaluate LLMs' cross-lingual robustness for this task, we formalize several modes for multilingual retrieval. Our experiments on several LLMs reveal that retrieving multilingual documents best improves response consistency and decreases geopolitical bias over using purely in-language documents, showing how incorporating diverse perspectives improves robustness. Also, querying in low-resource languages displays a much wider variance in the linguistic distribution of response citations. Our further experiments and case studies investigate how cross-lingual RAG is affected by aspects from IR to document contents. We release our benchmark and code to support further research towards ensuring equitable information access across languages at https://huggingface.co/datasets/borderlines/bordirlines.
△ Less
Submitted 18 February, 2025; v1 submitted 1 October, 2024;
originally announced October 2024.
-
Autoencoded Image Compression for Secure and Fast Transmission
Authors:
Aryan Kashyap Naveen,
Sunil Thunga,
Anuhya Murki,
Mahati A Kalale,
Shriya Anil
Abstract:
With exponential growth in the use of digital image data, the need for efficient transmission methods has become imperative. Traditional image compression techniques often sacrifice image fidelity for reduced file sizes, challenging maintaining quality and efficiency. They also compromise security, leaving images vulnerable to threats such as man-in-the-middle attacks. This paper proposes an autoe…
▽ More
With exponential growth in the use of digital image data, the need for efficient transmission methods has become imperative. Traditional image compression techniques often sacrifice image fidelity for reduced file sizes, challenging maintaining quality and efficiency. They also compromise security, leaving images vulnerable to threats such as man-in-the-middle attacks. This paper proposes an autoencoder architecture for image compression to not only help in dimensionality reduction but also inherently encrypt the images. The paper also introduces a composite loss function that combines reconstruction loss and residual loss for improved performance. The autoencoder architecture is designed to achieve optimal dimensionality reduction and regeneration accuracy while safeguarding the compressed data during transmission or storage. Images regenerated by the autoencoder are evaluated against three key metrics: reconstruction quality, compression ratio, and one-way delay during image transfer. The experiments reveal that the proposed architecture achieves an SSIM of 97.5% over the regenerated images and an average latency reduction of 87.5%, indicating its effectiveness as a secure and efficient solution for compressed image transfer.
△ Less
Submitted 14 October, 2024; v1 submitted 4 July, 2024;
originally announced July 2024.
-
Thread Detection and Response Generation using Transformers with Prompt Optimisation
Authors:
Kevin Joshua T,
Arnav Agarwal,
Shriya Sanjay,
Yash Sarda,
John Sahaya Rani Alex,
Saurav Gupta,
Sushant Kumar,
Vishwanath Kamath
Abstract:
Conversational systems are crucial for human-computer interaction, managing complex dialogues by identifying threads and prioritising responses. This is especially vital in multi-party conversations, where precise identification of threads and strategic response prioritisation ensure efficient dialogue management. To address these challenges an end-to-end model that identifies threads and prioriti…
▽ More
Conversational systems are crucial for human-computer interaction, managing complex dialogues by identifying threads and prioritising responses. This is especially vital in multi-party conversations, where precise identification of threads and strategic response prioritisation ensure efficient dialogue management. To address these challenges an end-to-end model that identifies threads and prioritises their response generation based on the importance was developed, involving a systematic decomposition of the problem into discrete components - thread detection, prioritisation, and performance optimisation which was meticulously analysed and optimised. These refined components seamlessly integrate into a unified framework, in conversational systems. Llama2 7b is used due to its high level of generalisation but the system can be updated with any open source Large Language Model(LLM). The computational capabilities of the Llama2 model was augmented by using fine tuning methods and strategic prompting techniques to optimise the model's performance, reducing computational time and increasing the accuracy of the model. The model achieves up to 10x speed improvement, while generating more coherent results compared to existing models.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Can an LLM-Powered Socially Assistive Robot Effectively and Safely Deliver Cognitive Behavioral Therapy? A Study With University Students
Authors:
Mina J. Kian,
Mingyu Zong,
Katrin Fischer,
Abhyuday Singh,
Anna-Maria Velentza,
Pau Sang,
Shriya Upadhyay,
Anika Gupta,
Misha A. Faruki,
Wallace Browning,
Sebastien M. R. Arnold,
Bhaskar Krishnamachari,
Maja J. Mataric
Abstract:
Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performa…
▽ More
Cognitive behavioral therapy (CBT) is a widely used therapeutic method for guiding individuals toward restructuring their thinking patterns as a means of addressing anxiety, depression, and other challenges. We developed a large language model (LLM)-powered prompt-engineered socially assistive robot (SAR) that guides participants through interactive CBT at-home exercises. We evaluated the performance of the SAR through a 15-day study with 38 university students randomly assigned to interact daily with the robot or a chatbot (using the same LLM), or complete traditional CBT worksheets throughout the duration of the study. We measured weekly therapeutic outcomes, changes in pre-/post-session anxiety measures, and adherence to completing CBT exercises. We found that self-reported measures of general psychological distress significantly decreased over the study period in the robot and worksheet conditions but not the chatbot condition. Furthermore, the SAR enabled significant single-session improvements for more sessions than the other two conditions combined. Our findings suggest that SAR-guided LLM-powered CBT may be as effective as traditional worksheet methods in supporting therapeutic progress from the beginning to the end of the study and superior in decreasing user anxiety immediately after completing the CBT exercise.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
The Hat Guessing Number of Cactus Graphs and Cycles
Authors:
Jeremy Chizewer,
I. M. J. McInnis,
Mehrdad Sohrabi,
Shriya Kaistha
Abstract:
We study the hat guessing game on graphs. In this game, a player is placed on each vertex $v$ of a graph $G$ and assigned a colored hat from $h(v)$ possible colors. Each player makes a deterministic guess on their hat color based on the colors assigned to the players on neighboring vertices, and the players win if at least one player correctly guesses his assigned color. If there exists a strategy…
▽ More
We study the hat guessing game on graphs. In this game, a player is placed on each vertex $v$ of a graph $G$ and assigned a colored hat from $h(v)$ possible colors. Each player makes a deterministic guess on their hat color based on the colors assigned to the players on neighboring vertices, and the players win if at least one player correctly guesses his assigned color. If there exists a strategy that ensures at least one player guesses correctly for every possible assignment of colors, the game defined by $\langle G,h\rangle$ is called winning. The hat guessing number of $G$ is the largest integer $q$ so that if $h(v)=q$ for all $v\in G$ then $\langle G,h\rangle$ is winning.
In this note, we determine whether $\langle G,h\rangle $ is winning for any $h$ whenever $G$ is a cycle, resolving a conjecture of Kokhas and Latyshev in the affirmative and extending it. We then use this result to determine the hat guessing number of every cactus graph, graphs in which every pair of cycles share at most one vertex.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Machine Learning Algorithms for Time Series Analysis and Forecasting
Authors:
Rameshwar Garg,
Shriya Barpanda,
Girish Rao Salanke N S,
Ramya S
Abstract:
Time series data is being used everywhere, from sales records to patients' health evolution metrics. The ability to deal with this data has become a necessity, and time series analysis and forecasting are used for the same. Every Machine Learning enthusiast would consider these as very important tools, as they deepen the understanding of the characteristics of data. Forecasting is used to predict…
▽ More
Time series data is being used everywhere, from sales records to patients' health evolution metrics. The ability to deal with this data has become a necessity, and time series analysis and forecasting are used for the same. Every Machine Learning enthusiast would consider these as very important tools, as they deepen the understanding of the characteristics of data. Forecasting is used to predict the value of a variable in the future, based on its past occurrences. A detailed survey of the various methods that are used for forecasting has been presented in this paper. The complete process of forecasting, from preprocessing to validation has also been explained thoroughly. Various statistical and deep learning models have been considered, notably, ARIMA, Prophet and LSTMs. Hybrid versions of Machine Learning models have also been explored and elucidated. Our work can be used by anyone to develop a good understanding of the forecasting process, and to identify various state of the art models which are being used today.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Robustness of Explanation Methods for NLP Models
Authors:
Shriya Atmakuri,
Tejas Chheda,
Dinesh Kandula,
Nishant Yadav,
Taesung Lee,
Hessel Tuinhof
Abstract:
Explanation methods have emerged as an important tool to highlight the features responsible for the predictions of neural networks. There is mounting evidence that many explanation methods are rather unreliable and susceptible to malicious manipulations. In this paper, we particularly aim to understand the robustness of explanation methods in the context of text modality. We provide initial insigh…
▽ More
Explanation methods have emerged as an important tool to highlight the features responsible for the predictions of neural networks. There is mounting evidence that many explanation methods are rather unreliable and susceptible to malicious manipulations. In this paper, we particularly aim to understand the robustness of explanation methods in the context of text modality. We provide initial insights and results towards devising a successful adversarial attack against text explanations. To our knowledge, this is the first attempt to evaluate the adversarial robustness of an explanation method. Our experiments show the explanation method can be largely disturbed for up to 86% of the tested samples with small changes in the input sentence and its semantics.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
Out of Distribution Detection on ImageNet-O
Authors:
Anugya Srivastava,
Shriya Jain,
Mugdha Thigle
Abstract:
Out of distribution (OOD) detection is a crucial part of making machine learning systems robust. The ImageNet-O dataset is an important tool in testing the robustness of ImageNet trained deep neural networks that are widely used across a variety of systems and applications. We aim to perform a comparative analysis of OOD detection methods on ImageNet-O, a first of its kind dataset with a label dis…
▽ More
Out of distribution (OOD) detection is a crucial part of making machine learning systems robust. The ImageNet-O dataset is an important tool in testing the robustness of ImageNet trained deep neural networks that are widely used across a variety of systems and applications. We aim to perform a comparative analysis of OOD detection methods on ImageNet-O, a first of its kind dataset with a label distribution different than that of ImageNet, that has been created to aid research in OOD detection for ImageNet models. As this dataset is fairly new, we aim to provide a comprehensive benchmarking of some of the current state of the art OOD detection methods on this novel dataset. This benchmarking covers a variety of model architectures, settings where we haves prior access to the OOD data versus when we don't, predictive score based approaches, deep generative approaches to OOD detection, and more.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings
Authors:
Shib Sankar Dasgupta,
Michael Boratko,
Siddhartha Mishra,
Shriya Atmakuri,
Dhruvesh Patel,
Xiang Lorraine Li,
Andrew McCallum
Abstract:
Learning representations of words in a continuous space is perhaps the most fundamental task in NLP, however words interact in ways much richer than vector dot product similarity can provide. Many relationships between words can be expressed set-theoretically, for example, adjective-noun compounds (eg. "red cars"$\subseteq$"cars") and homographs (eg. "tongue"$\cap$"body" should be similar to "mout…
▽ More
Learning representations of words in a continuous space is perhaps the most fundamental task in NLP, however words interact in ways much richer than vector dot product similarity can provide. Many relationships between words can be expressed set-theoretically, for example, adjective-noun compounds (eg. "red cars"$\subseteq$"cars") and homographs (eg. "tongue"$\cap$"body" should be similar to "mouth", while "tongue"$\cap$"language" should be similar to "dialect") have natural set-theoretic interpretations. Box embeddings are a novel region-based representation which provide the capability to perform these set-theoretic operations. In this work, we provide a fuzzy-set interpretation of box embeddings, and learn box representations of words using a set-theoretic training objective. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and perform a quantitative and qualitative analysis exploring the additional unique expressivity provided by Word2Box.
△ Less
Submitted 8 June, 2022; v1 submitted 27 June, 2021;
originally announced June 2021.
-
Foveal-pit inspired filtering of DVS spike response
Authors:
Shriya T. P. Gupta,
Pablo Linares-Serrano,
Basabdatta Sen Bhattacharya,
Teresa Serrano-Gotarredona
Abstract:
In this paper, we present results of processing Dynamic Vision Sensor (DVS) recordings of visual patterns with a retinal model based on foveal-pit inspired Difference of Gaussian (DoG) filters. A DVS sensor was stimulated with varying number of vertical white and black bars of different spatial frequencies moving horizontally at a constant velocity. The output spikes generated by the DVS sensor we…
▽ More
In this paper, we present results of processing Dynamic Vision Sensor (DVS) recordings of visual patterns with a retinal model based on foveal-pit inspired Difference of Gaussian (DoG) filters. A DVS sensor was stimulated with varying number of vertical white and black bars of different spatial frequencies moving horizontally at a constant velocity. The output spikes generated by the DVS sensor were applied as input to a set of DoG filters inspired by the receptive field structure of the primate visual pathway. In particular, these filters mimic the receptive fields of the midget and parasol ganglion cells (spiking neurons of the retina) that sub-serve the photo-receptors of the foveal-pit. The features extracted with the foveal-pit model are used for further classification using a spiking convolutional neural network trained with a backpropagation variant adapted for spiking neural networks.
△ Less
Submitted 29 May, 2021;
originally announced May 2021.
-
Implementing a foveal-pit inspired filter in a Spiking Convolutional Neural Network: a preliminary study
Authors:
Shriya T. P. Gupta,
Basabdatta Sen Bhattacharya
Abstract:
We have presented a Spiking Convolutional Neural Network (SCNN) that incorporates retinal foveal-pit inspired Difference of Gaussian filters and rank-order encoding. The model is trained using a variant of the backpropagation algorithm adapted to work with spiking neurons, as implemented in the Nengo library. We have evaluated the performance of our model on two publicly available datasets - one f…
▽ More
We have presented a Spiking Convolutional Neural Network (SCNN) that incorporates retinal foveal-pit inspired Difference of Gaussian filters and rank-order encoding. The model is trained using a variant of the backpropagation algorithm adapted to work with spiking neurons, as implemented in the Nengo library. We have evaluated the performance of our model on two publicly available datasets - one for digit recognition task, and the other for vehicle recognition task. The network has achieved up to 90% accuracy, where loss is calculated using the cross-entropy function. This is an improvement over around 57% accuracy obtained with the alternate approach of performing the classification without any kind of neural filtering. Overall, our proof-of-concept study indicates that introducing biologically plausible filtering in existing SCNN architecture will work well with noisy input images such as those in our vehicle recognition task. Based on our results, we plan to enhance our SCNN by integrating lateral inhibition-based redundancy reduction prior to rank-ordering, which will further improve the classification accuracy by the network.
△ Less
Submitted 29 May, 2021;
originally announced May 2021.
-
Comparison of Privacy-Preserving Distributed Deep Learning Methods in Healthcare
Authors:
Manish Gawali,
Arvind C S,
Shriya Suryavanshi,
Harshit Madaan,
Ashrika Gaikwad,
Bhanu Prakash KN,
Viraj Kulkarni,
Aniruddha Pant
Abstract:
In this paper, we compare three privacy-preserving distributed learning techniques: federated learning, split learning, and SplitFed. We use these techniques to develop binary classification models for detecting tuberculosis from chest X-rays and compare them in terms of classification performance, communication and computational costs, and training time. We propose a novel distributed learning ar…
▽ More
In this paper, we compare three privacy-preserving distributed learning techniques: federated learning, split learning, and SplitFed. We use these techniques to develop binary classification models for detecting tuberculosis from chest X-rays and compare them in terms of classification performance, communication and computational costs, and training time. We propose a novel distributed learning architecture called SplitFedv3, which performs better than split learning and SplitFedv2 in our experiments. We also propose alternate mini-batch training, a new training technique for split learning, that performs better than alternate client training, where clients take turns to train a model.
△ Less
Submitted 23 December, 2020;
originally announced December 2020.
-
HPERL: 3D Human Pose Estimation from RGB and LiDAR
Authors:
Michael Fürst,
Shriya T. P. Gupta,
René Schuster,
Oliver Wasenmüller,
Didier Stricker
Abstract:
In-the-wild human pose estimation has a huge potential for various fields, ranging from animation and action recognition to intention recognition and prediction for autonomous driving. The current state-of-the-art is focused only on RGB and RGB-D approaches for predicting the 3D human pose. However, not using precise LiDAR depth information limits the performance and leads to very inaccurate absol…
▽ More
In-the-wild human pose estimation has a huge potential for various fields, ranging from animation and action recognition to intention recognition and prediction for autonomous driving. The current state-of-the-art is focused only on RGB and RGB-D approaches for predicting the 3D human pose. However, not using precise LiDAR depth information limits the performance and leads to very inaccurate absolute pose estimation. With LiDAR sensors becoming more affordable and common on robots and autonomous vehicle setups, we propose an end-to-end architecture using RGB and LiDAR to predict the absolute 3D human pose with unprecedented precision. Additionally, we introduce a weakly-supervised approach to generate 3D predictions using 2D pose annotations from PedX [1]. This allows for many new opportunities in the field of 3D human pose estimation.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
A Novel Spatial-Spectral Framework for the Classification of Hyperspectral Satellite Imagery
Authors:
Shriya TP Gupta,
Sanjay K Sahay
Abstract:
Hyper-spectral satellite imagery is now widely being used for accurate disaster prediction and terrain feature classification. However, in such classification tasks, most of the present approaches use only the spectral information contained in the images. Therefore, in this paper, we present a novel framework that takes into account both the spectral and spatial information contained in the data f…
▽ More
Hyper-spectral satellite imagery is now widely being used for accurate disaster prediction and terrain feature classification. However, in such classification tasks, most of the present approaches use only the spectral information contained in the images. Therefore, in this paper, we present a novel framework that takes into account both the spectral and spatial information contained in the data for land cover classification. For this purpose, we use the Gaussian Maximum Likelihood (GML) and Convolutional Neural Network methods for the pixel-wise spectral classification and then, using segmentation maps generated by the Watershed algorithm, we incorporate the spatial contextual information into our model with a modified majority vote technique. The experimental analyses on two benchmark datasets demonstrate that our proposed methodology performs better than the earlier approaches by achieving an accuracy of 99.52% and 98.31% on the Pavia University and the Indian Pines datasets respectively. Additionally, our GML based approach, a non-deep learning algorithm, shows comparable performance to the state-of-the-art deep learning techniques, which indicates the importance of the proposed approach for performing a computationally efficient classification of hyper-spectral imagery.
△ Less
Submitted 22 July, 2020;
originally announced August 2020.
-
Recognition of Advertisement Emotions with Application to Computational Advertising
Authors:
Abhinav Shukla,
Shruti Shriya Gullapuram,
Harish Katti,
Mohan Kankanhalli,
Stefan Winkler,
Ramanathan Subramanian
Abstract:
Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience. However, most computational affect recognition (AR) approaches examine ads via the text modality, and only limited work has been devoted to decoding ad emotions from audiovisual or user cues. This work (1) compiles an affective ad dataset capable of evoking coher…
▽ More
Advertisements (ads) often contain strong affective content to capture viewer attention and convey an effective message to the audience. However, most computational affect recognition (AR) approaches examine ads via the text modality, and only limited work has been devoted to decoding ad emotions from audiovisual or user cues. This work (1) compiles an affective ad dataset capable of evoking coherent emotions across users; (2) explores the efficacy of content-centric convolutional neural network (CNN) features for AR vis-ã-vis handcrafted audio-visual descriptors; (3) examines user-centric ad AR from Electroencephalogram (EEG) responses acquired during ad-viewing, and (4) demonstrates how better affect predictions facilitate effective computational advertising as determined by a study involving 18 users. Experiments reveal that (a) CNN features outperform audiovisual descriptors for content-centric AR; (b) EEG features are able to encode ad-induced emotions better than content-based features; (c) Multi-task learning performs best among a slew of classification algorithms to achieve optimal AR, and (d) Pursuant to (b), EEG features also enable optimized ad insertion onto streamed video, as compared to content-based or manual insertion techniques in terms of ad memorability and overall user experience.
△ Less
Submitted 3 April, 2019;
originally announced April 2019.
-
Engagement Estimation in Advertisement Videos with EEG
Authors:
Sangeetha Balasubramanian,
Shruti Shriya Gullapuram,
Abhinav Shukla
Abstract:
Engagement is a vital metric in the advertising industry and its automatic estimation has huge commercial implications. This work presents a basic and simple framework for engagement estimation using EEG (electroencephalography) data specifically recorded while watching advertisement videos, and is meant to be a first step in a promising line of research. The system combines recent advances in low…
▽ More
Engagement is a vital metric in the advertising industry and its automatic estimation has huge commercial implications. This work presents a basic and simple framework for engagement estimation using EEG (electroencephalography) data specifically recorded while watching advertisement videos, and is meant to be a first step in a promising line of research. The system combines recent advances in low cost commercial Brain-Computer Interfaces with modeling user engagement in response to advertisement videos. We achieve an F1 score of nearly 0.7 for a binary classification of high and low values of self-reported engagement from multiple users. This study illustrates the possibility of seamless engagement measurement in the wild when interacting with media using a non invasive and readily available commercial EEG device. Performing engagement measurement via implicit tagging in this manner with a direct feedback from physiological signals, thus requiring no additional human effort, demonstrates a novel and potentially commercially relevant application in the area of advertisement video analysis.
△ Less
Submitted 8 December, 2018;
originally announced December 2018.
-
Evaluating Content-centric vs User-centric Ad Affect Recognition
Authors:
Abhinav Shukla,
Shruti Shriya Gullapuram,
Harish Katti,
Karthik Yadati,
Mohan Kankanhalli,
Ramanathan Subramanian
Abstract:
Despite the fact that advertisements (ads) often include strongly emotional content, very little work has been devoted to affect recognition (AR) from ads. This work explicitly compares content-centric and user-centric ad AR methodologies, and evaluates the impact of enhanced AR on computational advertising via a user study. Specifically, we (1) compile an affective ad dataset capable of evoking c…
▽ More
Despite the fact that advertisements (ads) often include strongly emotional content, very little work has been devoted to affect recognition (AR) from ads. This work explicitly compares content-centric and user-centric ad AR methodologies, and evaluates the impact of enhanced AR on computational advertising via a user study. Specifically, we (1) compile an affective ad dataset capable of evoking coherent emotions across users; (2) explore the efficacy of content-centric convolutional neural network (CNN) features for encoding emotions, and show that CNN features outperform low-level emotion descriptors; (3) examine user-centered ad AR by analyzing Electroencephalogram (EEG) responses acquired from eleven viewers, and find that EEG signals encode emotional information better than content descriptors; (4) investigate the relationship between objective AR and subjective viewer experience while watching an ad-embedded online video stream based on a study involving 12 users. To our knowledge, this is the first work to (a) expressly compare user vs content-centered AR for ads, and (b) study the relationship between modeling of ad emotions and its impact on a real-life advertising application.
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Affect Recognition in Ads with Application to Computational Advertising
Authors:
Abhinav Shukla,
Shruti Shriya Gullapuram,
Harish Katti,
Karthik Yadati,
Mohan Kankanhalli,
Ramanathan Subramanian
Abstract:
Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN…
▽ More
Advertisements (ads) often include strongly emotional content to leave a lasting impression on the viewer. This work (i) compiles an affective ad dataset capable of evoking coherent emotions across users, as determined from the affective opinions of five experts and 14 annotators; (ii) explores the efficacy of convolutional neural network (CNN) features for encoding emotions, and observes that CNN features outperform low-level audio-visual emotion descriptors upon extensive experimentation; and (iii) demonstrates how enhanced affect prediction facilitates computational advertising, and leads to better viewing experience while watching an online video stream embedded with ads based on a study involving 17 users. We model ad emotions based on subjective human opinions as well as objective multimodal features, and show how effectively modeling ad emotions can positively impact a real-life application.
△ Less
Submitted 6 September, 2017;
originally announced September 2017.
-
Stepping Forward with Exoskeletons: Team IHMC's Design and Approach in the 2016 Cybathlon
Authors:
Robert Griffin,
Tyson Cobb,
Travis Craig,
Mark Daniel,
Nick van Dijk,
Jeremy Gines,
Koen Kramer,
Shriya Shah,
Olger Siebinga,
Jesper Smith,
Peter Neuhaus
Abstract:
Exoskeletons are a promising technology that enables individuals with mobility limitations to walk again. As the 2016 Cybathlon illustrated, however, the community has a considerable way to go before exoskeletons have the necessary capabilities to be incorporated into daily life. While most exoskeletons power only hip and knee flexion, Team Institute for Human and Machine Cognition (IHMC) presents…
▽ More
Exoskeletons are a promising technology that enables individuals with mobility limitations to walk again. As the 2016 Cybathlon illustrated, however, the community has a considerable way to go before exoskeletons have the necessary capabilities to be incorporated into daily life. While most exoskeletons power only hip and knee flexion, Team Institute for Human and Machine Cognition (IHMC) presents a new exoskeleton, Mina v2, which includes a powered ankle dorsi/plantar flexion. As our entry to the 2016 Cybathlon Powered Exoskeleton Competition, Mina v2's performance allowed us to explore the effectiveness of its powered ankle compared to other powered exoskeletons for pilots with paraplegia. We designed our gaits to incorporate powered ankle plantar flexion to help improve mobility, which allowed our pilot to navigate the given Cybathlon tasks quickly, including those that required ascending movements, and reliably achieve average, conservative walking speeds of 1.04 km/h (0.29 m/s). This enabled our team to place second overall in the Powered Exoskeleton Competition in the 2016 Cybathlon.
△ Less
Submitted 24 December, 2017; v1 submitted 28 February, 2017;
originally announced February 2017.