-
The Amazon Nova Family of Models: Technical Report and Model Card
Authors:
Amazon AGI,
Aaron Langford,
Aayush Shah,
Abhanshu Gupta,
Abhimanyu Bhatter,
Abhinav Goyal,
Abhinav Mathur,
Abhinav Mohanty,
Abhishek Kumar,
Abhishek Sethi,
Abi Komma,
Abner Pena,
Achin Jain,
Adam Kunysz,
Adam Opyrchal,
Adarsh Singh,
Aditya Rawal,
Adok Achar Budihal Prasad,
Adrià de Gispert,
Agnika Kumar,
Aishwarya Aryamane,
Ajay Nair,
Akilan M,
Akshaya Iyengar,
Akshaya Vishnu Kudlu Shanbhogue
, et al. (761 additional authors not shown)
Abstract:
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents…
▽ More
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
△ Less
Submitted 17 March, 2025;
originally announced June 2025.
-
A Comparative Study of U-Net Architectures for Change Detection in Satellite Images
Authors:
Yaxita Amin,
Naimisha S Trivedi,
Rashmi Bhattad
Abstract:
Remote sensing change detection is essential for monitoring the everchanging landscapes of the Earth. The U-Net architecture has gained popularity for its capability to capture spatial information and perform pixel-wise classification. However, their application in the Remote sensing field remains largely unexplored. Therefore, this paper fill the gap by conducting a comprehensive analysis of 34 p…
▽ More
Remote sensing change detection is essential for monitoring the everchanging landscapes of the Earth. The U-Net architecture has gained popularity for its capability to capture spatial information and perform pixel-wise classification. However, their application in the Remote sensing field remains largely unexplored. Therefore, this paper fill the gap by conducting a comprehensive analysis of 34 papers. This study conducts a comparison and analysis of 18 different U-Net variations, assessing their potential for detecting changes in remote sensing. We evaluate both benefits along with drawbacks of each variation within the framework of this particular application. We emphasize variations that are explicitly built for change detection, such as Siamese Swin-U-Net, which utilizes a Siamese architecture. The analysis highlights the significance of aspects such as managing data from different time periods and collecting relationships over a long distance to enhance the precision of change detection. This study provides valuable insights for researchers and practitioners that choose U-Net versions for remote sensing change detection tasks.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
LLM assisted web application functional requirements generation: A case study of four popular LLMs over a Mess Management System
Authors:
Rashmi Gupta,
Aditya K Gupta,
Aarav Jain,
Avinash C Pandey,
Atul Gupta
Abstract:
Like any other discipline, Large Language Models (LLMs) have significantly impacted software engineering by helping developers generate the required artifacts across various phases of software development. This paper presents a case study comparing the performance of popular LLMs GPT, Claude, Gemini, and DeepSeek in generating functional specifications that include use cases, business rules, and c…
▽ More
Like any other discipline, Large Language Models (LLMs) have significantly impacted software engineering by helping developers generate the required artifacts across various phases of software development. This paper presents a case study comparing the performance of popular LLMs GPT, Claude, Gemini, and DeepSeek in generating functional specifications that include use cases, business rules, and collaborative workflows for a web application, the Mess Management System. The study evaluated the quality of LLM generated use cases, business rules, and collaborative workflows in terms of their syntactic and semantic correctness, consistency, non ambiguity, and completeness compared to the reference specifications against the zero-shot prompted problem statement. Our results suggested that all four LLMs can specify syntactically and semantically correct, mostly non-ambiguous artifacts. Still, they may be inconsistent at times and may differ significantly in the completeness of the generated specification. Claude and Gemini generated all the reference use cases, with Claude achieving the most complete but somewhat redundant use case specifications. Similar results were obtained for specifying workflows. However, all four LLMs struggled to generate relevant Business Rules, with DeepSeek generating the most reference rules but with less completeness. Overall, Claude generated more complete specification artifacts, while Gemini was more precise in the specifications it generated.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance
Authors:
Mintong Kang,
Vinayshekhar Bannihatti Kumar,
Shamik Roy,
Abhishek Kumar,
Sopan Khosla,
Balakrishnan Murali Narayanaswamy,
Rashmi Gangadharaiah
Abstract:
Text-to-image diffusion models often exhibit biases toward specific demographic groups, such as generating more males than females when prompted to generate images of engineers, raising ethical concerns and limiting their adoption. In this paper, we tackle the challenge of mitigating generation bias towards any target attribute value (e.g., "male" for "gender") in diffusion models while preserving…
▽ More
Text-to-image diffusion models often exhibit biases toward specific demographic groups, such as generating more males than females when prompted to generate images of engineers, raising ethical concerns and limiting their adoption. In this paper, we tackle the challenge of mitigating generation bias towards any target attribute value (e.g., "male" for "gender") in diffusion models while preserving generation quality. We propose FairGen, an adaptive latent guidance mechanism which controls the generation distribution during inference. In FairGen, a latent guidance module dynamically adjusts the diffusion process to enforce specific attributes, while a memory module tracks the generation statistics and steers latent guidance to align with the targeted fair distribution of the attribute values. Further, given the limitations of existing datasets in comprehensively assessing bias in diffusion models, we introduce a holistic bias evaluation benchmark HBE, covering diverse domains and incorporating complex prompts across various applications. Extensive evaluations on HBE and Stable Bias datasets demonstrate that FairGen outperforms existing bias mitigation approaches, achieving substantial bias reduction (e.g., 68.5% gender bias reduction on Stable Diffusion 2). Ablation studies highlight FairGen's ability to flexibly and precisely control generation distribution at any user-specified granularity, ensuring adaptive and targeted bias mitigation.
△ Less
Submitted 25 February, 2025;
originally announced March 2025.
-
Stroke classification using Virtual Hybrid Edge Detection from in silico electrical impedance tomography data
Authors:
Juan Pablo Agnelli,
Fernando S. Moura,
Siiri Rautio,
Melody Alsaker,
Rashmi Murthy,
Matti Lassas,
Samuli Siltanen
Abstract:
Electrical impedance tomography (EIT) is a non-invasive imaging method for recovering the internal conductivity of a physical body from electric boundary measurements. EIT combined with machine learning has shown promise for the classification of strokes. However, most previous works have used raw EIT voltage data as network inputs. We build upon a recent development which suggested the use of spe…
▽ More
Electrical impedance tomography (EIT) is a non-invasive imaging method for recovering the internal conductivity of a physical body from electric boundary measurements. EIT combined with machine learning has shown promise for the classification of strokes. However, most previous works have used raw EIT voltage data as network inputs. We build upon a recent development which suggested the use of special noise-robust Virtual Hybrid Edge Detection (VHED) functions as network inputs, although that work used only highly simplified and mathematically ideal models. In this work we strengthen the case for the use of EIT, and VHED functions especially, for stroke classification. We design models with high detail and mathematical realism to test the use of VHED functions as inputs. Virtual patients are created using a physically detailed 2D head model which includes features known to create challenges in real-world imaging scenarios. Conductivity values are drawn from statistically realistic distributions, and phantoms are afflicted with either hemorrhagic or ischemic strokes of various shapes and sizes. Simulated noisy EIT electrode data, generated using the realistic Complete Electrode Model (CEM) as opposed to the mathematically ideal continuum model, is processed to obtain VHED functions. We compare the use of VHED functions as inputs against the alternative paradigm of using raw EIT voltages. Our results show that (i) stroke classification can be performed with high accuracy using 2D EIT data from physically detailed and mathematically realistic models, and (ii) in the presence of noise, VHED functions outperform raw data as network inputs.
△ Less
Submitted 29 January, 2025; v1 submitted 24 January, 2025;
originally announced January 2025.
-
Implementation Of Wildlife Observation System
Authors:
Neethu K N,
Rakshitha Y Nayak,
Rashmi,
Meghana S
Abstract:
By entering the habitats of wild animals, wildlife watchers can engage closely with them. There are some wild animals that are not always safe to approach. Therefore, we suggest this system for observing wildlife. Android phones can be used by users to see live events. Wildlife observers can thus get a close-up view of wild animals by employing this robotic vehicle. The commands are delivered to t…
▽ More
By entering the habitats of wild animals, wildlife watchers can engage closely with them. There are some wild animals that are not always safe to approach. Therefore, we suggest this system for observing wildlife. Android phones can be used by users to see live events. Wildlife observers can thus get a close-up view of wild animals by employing this robotic vehicle. The commands are delivered to the system via a Wi-Fi module. As we developed the technology to enable our robot to deal with the challenges of maintaining continuous surveillance of a target, we found that our robot needed to be able to move silently and purposefully when monitoring a natural target without being noticed. After processing the data, the computer sends commands to the motors to turn on. The driver motors, which deliver the essential signal outputs to drive the vehicle movement, are now in charge of driving the motors.
△ Less
Submitted 8 January, 2025;
originally announced January 2025.
-
Constrained Decoding with Speculative Lookaheads
Authors:
Nishanth Nakshatri,
Shamik Roy,
Rajarshi Das,
Suthee Chaidaroon,
Leonid Boytsov,
Rashmi Gangadharaiah
Abstract:
Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead roll-out operations for each generated token makes CDLH prohibitively expensive, resulting in low adoption in practice. In contrast, common decoding strategies such as greedy decoding are extremely efficient, but achieve very low con…
▽ More
Constrained decoding with lookahead heuristics (CDLH) is a highly effective method for aligning LLM generations to human preferences. However, the extensive lookahead roll-out operations for each generated token makes CDLH prohibitively expensive, resulting in low adoption in practice. In contrast, common decoding strategies such as greedy decoding are extremely efficient, but achieve very low constraint satisfaction. We propose constrained decoding with speculative lookaheads (CDSL), a technique that significantly improves upon the inference efficiency of CDLH without experiencing the drastic performance reduction seen with greedy decoding. CDSL is motivated by the recently proposed idea of speculative decoding that uses a much smaller draft LLM for generation and a larger target LLM for verification. In CDSL, the draft model is used to generate lookaheads which is verified by a combination of target LLM and task-specific reward functions. This process accelerates decoding by reducing the computational burden while maintaining strong performance. We evaluate CDSL in two constraint decoding tasks with three LLM families and achieve 2.2x to 12.15x speedup over CDLH without significant performance reduction.
△ Less
Submitted 10 February, 2025; v1 submitted 9 December, 2024;
originally announced December 2024.
-
A Multi-way Parallel Named Entity Annotated Corpus for English, Tamil and Sinhala
Authors:
Surangika Ranathunga,
Asanka Ranasinghea,
Janaka Shamala,
Ayodya Dandeniyaa,
Rashmi Galappaththia,
Malithi Samaraweeraa
Abstract:
This paper presents a multi-way parallel English-Tamil-Sinhala corpus annotated with Named Entities (NEs), where Sinhala and Tamil are low-resource languages. Using pre-trained multilingual Language Models (mLMs), we establish new benchmark Named Entity Recognition (NER) results on this dataset for Sinhala and Tamil. We also carry out a detailed investigation on the NER capabilities of different t…
▽ More
This paper presents a multi-way parallel English-Tamil-Sinhala corpus annotated with Named Entities (NEs), where Sinhala and Tamil are low-resource languages. Using pre-trained multilingual Language Models (mLMs), we establish new benchmark Named Entity Recognition (NER) results on this dataset for Sinhala and Tamil. We also carry out a detailed investigation on the NER capabilities of different types of mLMs. Finally, we demonstrate the utility of our NER system on a low-resource Neural Machine Translation (NMT) task. Our dataset is publicly released: https://github.com/suralk/multiNER.
△ Less
Submitted 14 January, 2025; v1 submitted 2 December, 2024;
originally announced December 2024.
-
ReputeStream: Mitigating Free-Riding through Reputation-Based Multi-Layer P2P Live Streaming
Authors:
Rashmi Kushwaha,
Rahul Bhattacharyya,
Yatindra Nath Singh
Abstract:
This paper presents a novel algorithm for peer-to-peer (P2P) live streaming that addresses the limitations of single-layer systems through a multi-layered approach. The proposed solution adapts to diverse user capabilities and bandwidth conditions while tackling common P2P challenges such as free-riding, malicious behavior, churn, and flash crowds. By implementing a reputation-based system, the al…
▽ More
This paper presents a novel algorithm for peer-to-peer (P2P) live streaming that addresses the limitations of single-layer systems through a multi-layered approach. The proposed solution adapts to diverse user capabilities and bandwidth conditions while tackling common P2P challenges such as free-riding, malicious behavior, churn, and flash crowds. By implementing a reputation-based system, the algorithm promotes fair resource sharing and active participation. The algorithm also incorporates a request-to-join mechanism to effectively manage flash crowds. In addition, a dynamic reputation system improves network efficiency by strategically positioning high-reputation peers closer to video sources or other significant contributors.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
Still Not Quite There! Evaluating Large Language Models for Comorbid Mental Health Diagnosis
Authors:
Amey Hengle,
Atharva Kulkarni,
Shantanu Patankar,
Madhumitha Chandrasekaran,
Sneha D'Silva,
Jemima Jacob,
Rashmi Gupta
Abstract:
In this study, we introduce ANGST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts. Unlike contemporary datasets that often oversimplify the intricate interplay between different mental health disorders by treating them as isolated conditions, ANGST enables multi-label classification, allowing each post to be simultaneously identified…
▽ More
In this study, we introduce ANGST, a novel, first-of-its kind benchmark for depression-anxiety comorbidity classification from social media posts. Unlike contemporary datasets that often oversimplify the intricate interplay between different mental health disorders by treating them as isolated conditions, ANGST enables multi-label classification, allowing each post to be simultaneously identified as indicating depression and/or anxiety. Comprising 2876 meticulously annotated posts by expert psychologists and an additional 7667 silver-labeled posts, ANGST posits a more representative sample of online mental health discourse. Moreover, we benchmark ANGST using various state-of-the-art language models, ranging from Mental-BERT to GPT-4. Our results provide significant insights into the capabilities and limitations of these models in complex diagnostic scenarios. While GPT-4 generally outperforms other models, none achieve an F1 score exceeding 72% in multi-class comorbid classification, underscoring the ongoing challenges in applying language models to mental health diagnostics.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
ECHO: Environmental Sound Classification with Hierarchical Ontology-guided Semi-Supervised Learning
Authors:
Pranav Gupta,
Raunak Sharma,
Rashmi Kumari,
Sri Krishna Aditya,
Shwetank Choudhary,
Sumit Kumar,
Kanchana M,
Thilagavathy R
Abstract:
Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised methods which concentrate on the utilization of unlabeled data, and self-supervised methods which learn the intermediate representation through pretext ta…
▽ More
Environment Sound Classification has been a well-studied research problem in the field of signal processing and up till now more focus has been laid on fully supervised approaches. Over the last few years, focus has moved towards semi-supervised methods which concentrate on the utilization of unlabeled data, and self-supervised methods which learn the intermediate representation through pretext task or contrastive learning. However, both approaches require a vast amount of unlabelled data to improve performance. In this work, we propose a novel framework called Environmental Sound Classification with Hierarchical Ontology-guided semi-supervised Learning (ECHO) that utilizes label ontology-based hierarchy to learn semantic representation by defining a novel pretext task. In the pretext task, the model tries to predict coarse labels defined by the Large Language Model (LLM) based on ground truth label ontology. The trained model is further fine-tuned in a supervised way to predict the actual task. Our proposed novel semi-supervised framework achieves an accuracy improvement in the range of 1\% to 8\% over baseline systems across three datasets namely UrbanSound8K, ESC-10, and ESC-50.
△ Less
Submitted 21 September, 2024;
originally announced September 2024.
-
Reputation-Driven Peer-to-Peer Live Streaming Architecture for Preventing Free-Riding
Authors:
Rashmi Kushwaha,
Rahul Bhattacharyya,
Yatindra Nath Singh
Abstract:
We present a peer-to-peer (P2P) live-streaming architecture designed to address challenges such as free-riding, malicious peers, churn, and network instability through the integration of a reputation system. The proposed algorithm incentivizes active peer participation while discouraging opportunistic behaviors, with a reputation mechanism that rewards altruistic peers and penalizes free riders an…
▽ More
We present a peer-to-peer (P2P) live-streaming architecture designed to address challenges such as free-riding, malicious peers, churn, and network instability through the integration of a reputation system. The proposed algorithm incentivizes active peer participation while discouraging opportunistic behaviors, with a reputation mechanism that rewards altruistic peers and penalizes free riders and malicious actors. To manage peer dynamics, the algorithm continuously updates the strategies and adjusts to changing neighbors. It also implements a request-to-join mechanism for flash crowd scenarios, allowing the source node to delegate requests to child nodes, forming an interconnected tree structure that efficiently handles high demand and maintains system stability. The decentralized reputation mechanism promotes long-term sustainability in the P2P live streaming system.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
ORS: A novel Olive Ridley Survival inspired Meta-heuristic Optimization Algorithm
Authors:
Niranjan Panigrahi,
Sourav Kumar Bhoi,
Debasis Mohapatra,
Rashmi Ranjan Sahoo,
Kshira Sagar Sahoo,
Anil Mohapatra
Abstract:
Meta-heuristic algorithmic development has been a thrust area of research since its inception. In this paper, a novel meta-heuristic optimization algorithm, Olive Ridley Survival (ORS), is proposed which is inspired from survival challenges faced by hatchlings of Olive Ridley sea turtle. A major fact about survival of Olive Ridley reveals that out of one thousand Olive Ridley hatchlings which emer…
▽ More
Meta-heuristic algorithmic development has been a thrust area of research since its inception. In this paper, a novel meta-heuristic optimization algorithm, Olive Ridley Survival (ORS), is proposed which is inspired from survival challenges faced by hatchlings of Olive Ridley sea turtle. A major fact about survival of Olive Ridley reveals that out of one thousand Olive Ridley hatchlings which emerge from nest, only one survive at sea due to various environmental and other factors. This fact acts as the backbone for developing the proposed algorithm. The algorithm has two major phases: hatchlings survival through environmental factors and impact of movement trajectory on its survival. The phases are mathematically modelled and implemented along with suitable input representation and fitness function. The algorithm is analysed theoretically. To validate the algorithm, fourteen mathematical benchmark functions from standard CEC test suites are evaluated and statistically tested. Also, to study the efficacy of ORS on recent complex benchmark functions, ten benchmark functions of CEC-06-2019 are evaluated. Further, three well-known engineering problems are solved by ORS and compared with other state-of-the-art meta-heuristics. Simulation results show that in many cases, the proposed ORS algorithm outperforms some state-of-the-art meta-heuristic optimization algorithms. The sub-optimal behavior of ORS in some recent benchmark functions is also observed.
△ Less
Submitted 2 November, 2024; v1 submitted 13 September, 2024;
originally announced September 2024.
-
Small Object Detection for Indoor Assistance to the Blind using YOLO NAS Small and Super Gradients
Authors:
Rashmi BN,
R. Guru,
Anusuya M A
Abstract:
Advancements in object detection algorithms have opened new avenues for assistive technologies that cater to the needs of visually impaired individuals. This paper presents a novel approach for indoor assistance to the blind by addressing the challenge of small object detection. We propose a technique YOLO NAS Small architecture, a lightweight and efficient object detection model, optimized using…
▽ More
Advancements in object detection algorithms have opened new avenues for assistive technologies that cater to the needs of visually impaired individuals. This paper presents a novel approach for indoor assistance to the blind by addressing the challenge of small object detection. We propose a technique YOLO NAS Small architecture, a lightweight and efficient object detection model, optimized using the Super Gradients training framework. This combination enables real-time detection of small objects crucial for assisting the blind in navigating indoor environments, such as furniture, appliances, and household items. Proposed method emphasizes low latency and high accuracy, enabling timely and informative voice-based guidance to enhance the user's spatial awareness and interaction with their surroundings. The paper details the implementation, experimental results, and discusses the system's effectiveness in providing a practical solution for indoor assistance to the visually impaired.
△ Less
Submitted 28 August, 2024;
originally announced September 2024.
-
Quantization of KLT Matrices via GMRF Modeling of Image Blocks for Adaptive Transform Coding
Authors:
Rashmi Boragolla,
Pradeepa Yahampath
Abstract:
Forward adaptive transform coding of images requires a codebook of transform matrices from which the best transform can be chosen for each macroblock. Codebook construction is a problem of designing a quantizer for Karhunen-Lóeve transform (KLT) matrices estimated from sample image blocks. We present a novel method for KLT matrix quantization based on a finite-lattice non-causal homogeneous Gauss-…
▽ More
Forward adaptive transform coding of images requires a codebook of transform matrices from which the best transform can be chosen for each macroblock. Codebook construction is a problem of designing a quantizer for Karhunen-Lóeve transform (KLT) matrices estimated from sample image blocks. We present a novel method for KLT matrix quantization based on a finite-lattice non-causal homogeneous Gauss-Markov random field (GMRF) model with asymmetric Neumann boundary conditions for blocks in natural images. The matrix quantization problem is solved in the GMRF parameter space, simplifying the harder problem of quantizing a large matrix subject to an orthonormality constraint to a low-dimensional vector quantization problem. Typically used GMRF parameter estimation methods such as maximum-likelihood (ML) do not necessarily maximize the coding performance of the resulting transform matrices. To this end we propose a method for GMRF parameter estimation from sample image data, which maximizes the high-rate transform coding gain. We also investigate the application of GMRF-based transforms to variable block-size adaptive transform coding.
△ Less
Submitted 24 June, 2024;
originally announced July 2024.
-
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow
Authors:
Yixuan Mei,
Yonghao Zhuang,
Xupeng Miao,
Juncheng Yang,
Zhihao Jia,
Rashmi Vinayak
Abstract:
This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving in heterogeneous GPU clusters. The key idea behind Helix is to formulate inference computation of LLMs over heterogeneous GPUs and network connections as a max-flow problem on directed, weighted graphs, whose nodes represent GPU instances and edges capture both GPU and network hete…
▽ More
This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving in heterogeneous GPU clusters. The key idea behind Helix is to formulate inference computation of LLMs over heterogeneous GPUs and network connections as a max-flow problem on directed, weighted graphs, whose nodes represent GPU instances and edges capture both GPU and network heterogeneity through their capacities. Helix then uses a mixed integer linear programming (MILP) algorithm to discover highly optimized strategies to serve LLMs on heterogeneous GPUs. This approach allows Helix to jointly optimize model placement and request scheduling, two highly entangled tasks in heterogeneous LLM serving. Our evaluation on several heterogeneous clusters ranging from 24 to 42 GPU nodes shows that Helix improves serving throughput by up to 3.3x and reduces prompting and decoding latency by up to 66% and 24%, respectively, compared to existing approaches. Helix is available at https://github.com/Thesys-lab/Helix-ASPLOS25.
△ Less
Submitted 5 March, 2025; v1 submitted 3 June, 2024;
originally announced June 2024.
-
On Low Field Size Constructions of Access-Optimal Convertible Codes
Authors:
Saransh Chopra,
Francisco Maturana,
K. V. Rashmi
Abstract:
Most large-scale storage systems employ erasure coding to provide resilience against disk failures. Recent work has shown that tuning this redundancy to changes in disk failure rates leads to substantial storage savings. This process requires code conversion, wherein data encoded using an $[n^{I\mskip-2mu},k^{I\mskip-2mu}]$ initial code has to be transformed into data encoded using an…
▽ More
Most large-scale storage systems employ erasure coding to provide resilience against disk failures. Recent work has shown that tuning this redundancy to changes in disk failure rates leads to substantial storage savings. This process requires code conversion, wherein data encoded using an $[n^{I\mskip-2mu},k^{I\mskip-2mu}]$ initial code has to be transformed into data encoded using an $[n^{F\mskip-2mu},k^{F\mskip-2mu}]$ final code, a resource-intensive operation. Convertible codes are a class of codes that enable efficient code conversion while maintaining other desirable properties. In this paper, we focus on the access cost of conversion (total number of code symbols accessed in the conversion process) and on an important subclass of conversions known as the merge regime (combining multiple initial codewords into a single final codeword).
In this setting, explicit constructions are known for systematic access-optimal Maximum Distance Separable (MDS) convertible codes for all parameters in the merge regime. However, the existing construction for a key subset of these parameters, which makes use of Vandermonde parity matrices, requires a large field size making it unsuitable for practical applications. In this paper, we provide (1) sharper bounds on the minimum field size requirement for such codes, and (2) explicit constructions for low field sizes for several parameter ranges. In doing so, we provide a proof of super-regularity of specially designed classes of Vandermonde matrices that could be of independent interest.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
InsightNet: Structured Insight Mining from Customer Feedback
Authors:
Sandeep Sricharan Mukku,
Manan Soni,
Jitenkumar Rana,
Chetan Aggarwal,
Promod Yenigalla,
Rashmi Patange,
Shyam Mohan
Abstract:
We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level t…
▽ More
We propose InsightNet, a novel approach for the automated extraction of structured insights from customer reviews. Our end-to-end machine learning framework is designed to overcome the limitations of current solutions, including the absence of structure for identified topics, non-standard aspect names, and lack of abundant training data. The proposed solution builds a semi-supervised multi-level taxonomy from raw reviews, a semantic similarity heuristic approach to generate labelled data and employs a multi-task insight extraction architecture by fine-tuning an LLM. InsightNet identifies granular actionable topics with customer sentiments and verbatim for each topic. Evaluations on real-world customer review data show that InsightNet performs better than existing solutions in terms of structure, hierarchy and completeness. We empirically demonstrate that InsightNet outperforms the current state-of-the-art methods in multi-label topic classification, achieving an F1 score of 0.85, which is an improvement of 11% F1-score over the previous best results. Additionally, InsightNet generalises well for unseen aspects and suggests new topics to be added to the taxonomy.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
A Unified Module for Accelerating STABLE-DIFFUSION: LCM-LORA
Authors:
Ayush Thakur,
Rashmi Vashisth
Abstract:
This paper presents a comprehensive study on the unified module for accelerating stable-diffusion processes, specifically focusing on the lcm-lora module. Stable-diffusion processes play a crucial role in various scientific and engineering domains, and their acceleration is of paramount importance for efficient computational performance. The standard iterative procedures for solving fixed-source d…
▽ More
This paper presents a comprehensive study on the unified module for accelerating stable-diffusion processes, specifically focusing on the lcm-lora module. Stable-diffusion processes play a crucial role in various scientific and engineering domains, and their acceleration is of paramount importance for efficient computational performance. The standard iterative procedures for solving fixed-source discrete ordinates problems often exhibit slow convergence, particularly in optically thick scenarios. To address this challenge, unconditionally stable diffusion-acceleration methods have been developed, aiming to enhance the computational efficiency of transport equations and discrete ordinates problems. This study delves into the theoretical foundations and numerical results of unconditionally stable diffusion synthetic acceleration methods, providing insights into their stability and performance for model discrete ordinates problems. Furthermore, the paper explores recent advancements in diffusion model acceleration, including on device acceleration of large diffusion models via gpu aware optimizations, highlighting the potential for significantly improved inference latency. The results and analyses in this study provide important insights into stable diffusion processes and have important ramifications for the creation and application of acceleration methods specifically, the lcm-lora module in a variety of computing environments.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Loops On Retrieval Augmented Generation (LoRAG)
Authors:
Ayush Thakur,
Rashmi Vashisth
Abstract:
This paper presents Loops On Retrieval Augmented Generation (LoRAG), a new framework designed to enhance the quality of retrieval-augmented text generation through the incorporation of an iterative loop mechanism. The architecture integrates a generative model, a retrieval mechanism, and a dynamic loop module, allowing for iterative refinement of the generated text through interactions with releva…
▽ More
This paper presents Loops On Retrieval Augmented Generation (LoRAG), a new framework designed to enhance the quality of retrieval-augmented text generation through the incorporation of an iterative loop mechanism. The architecture integrates a generative model, a retrieval mechanism, and a dynamic loop module, allowing for iterative refinement of the generated text through interactions with relevant information retrieved from the input context. Experimental evaluations on benchmark datasets demonstrate that LoRAG surpasses existing state-of-the-art models in terms of BLEU score, ROUGE score, and perplexity, showcasing its effectiveness in achieving both coherence and relevance in generated text. The qualitative assessment further illustrates LoRAG's capability to produce contextually rich and coherent outputs. This research contributes valuable insights into the potential of iterative loops in mitigating challenges in text generation, positioning LoRAG as a promising advancement in the field.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Agonist-Antagonist Pouch Motors: Bidirectional Soft Actuators Enhanced by Thermally Responsive Peltier Elements
Authors:
Trevor Exley,
Rashmi Wijesundara,
Nathan Tan,
Akshay Sunkara,
Xinyu He,
Shuopu Wang,
Bonnie Chan,
Aditya Jain,
Luis Espinosa,
Amir Jafari
Abstract:
In this study, we introduce a novel Mylar-based pouch motor design that leverages the reversible actuation capabilities of Peltier junctions to enable agonist-antagonist muscle mimicry in soft robotics. Addressing the limitations of traditional silicone-based materials, such as leakage and phase-change fluid degradation, our pouch motors filled with Novec 7000 provide a durable and leak-proof solu…
▽ More
In this study, we introduce a novel Mylar-based pouch motor design that leverages the reversible actuation capabilities of Peltier junctions to enable agonist-antagonist muscle mimicry in soft robotics. Addressing the limitations of traditional silicone-based materials, such as leakage and phase-change fluid degradation, our pouch motors filled with Novec 7000 provide a durable and leak-proof solution for geometric modeling. The integration of flexible Peltier junctions offers a significant advantage over conventional Joule heating methods by allowing active and reversible heating and cooling cycles. This innovation not only enhances the reliability and longevity of soft robotic applications but also broadens the scope of design possibilities, including the development of agonist-antagonist artificial muscles, grippers with can manipulate through flexion and extension, and an anchor-slip style simple crawler design. Our findings indicate that this approach could lead to more efficient, versatile, and durable robotic systems, marking a significant advancement in the field of soft robotics.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
TVIM: Thermo-Active Variable Impedance Module: Evaluating Shear-Mode Capabilities of Polycaprolactone
Authors:
Trevor Exley,
Rashmi Wijesundara,
Shuopu Wang,
Arian Moridani,
Amir Jafari
Abstract:
In this work, we introduce an advanced thermo-active variable impedance module which builds upon our previous innovation in thermal-based impedance adjustment for actuation systems. Our initial design harnessed the temperature-responsive, viscoelastic properties of Polycaprolactone (PCL) to modulate stiffness and damping, facilitated by integrated flexible Peltier elements. While effective, the re…
▽ More
In this work, we introduce an advanced thermo-active variable impedance module which builds upon our previous innovation in thermal-based impedance adjustment for actuation systems. Our initial design harnessed the temperature-responsive, viscoelastic properties of Polycaprolactone (PCL) to modulate stiffness and damping, facilitated by integrated flexible Peltier elements. While effective, the reliance on compressing and the inherent stress relaxation characteristics of PCL led to suboptimal response times in impedance adjustments. Addressing these limitations, the current iteration of our module pivots to a novel 'shear-mode' operation. By conducting comprehensive shear rheology analyses on PCL, we have identified a configuration that eliminates the viscoelastic delay, offering a faster response with improved heat transfer efficiency. A key advantage of our module lies in its scalability and elimination of additional mechanical actuators for impedance adjustment. The compactness and efficiency of thermal actuation through Peltier elements allow for significant downsizing, making these thermal, variable impedance modules exceptionally well-suited for applications where space constraints and actuator weight are critical considerations. This development represents a significant leap forward in the design of variable impedance actuators, offering a more versatile, responsive, and compact solution for a wide range of robotic and biomechanical applications.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
The Decisive Power of Indecision: Low-Variance Risk-Limiting Audits and Election Contestation via Marginal Mark Recording
Authors:
Benjamin Fuller,
Rashmi Pai,
Alexander Russell
Abstract:
Risk-limiting audits (RLAs) are techniques for verifying the outcomes of large elections. While they provide rigorous guarantees of correctness, widespread adoption has been impeded by both efficiency concerns and the fact they offer statistical, rather than absolute, conclusions. We attend to both of these difficulties, defining new families of audits that improve efficiency and offer qualitative…
▽ More
Risk-limiting audits (RLAs) are techniques for verifying the outcomes of large elections. While they provide rigorous guarantees of correctness, widespread adoption has been impeded by both efficiency concerns and the fact they offer statistical, rather than absolute, conclusions. We attend to both of these difficulties, defining new families of audits that improve efficiency and offer qualitative advances in statistical power.
Our new audits are enabled by revisiting the standard notion of a cast-vote record so that it can declare multiple possible mark interpretations rather than a single decision; this can reflect the presence of marginal marks, which appear regularly on hand-marked ballots. We show that this simple expedient can offer significant efficiency improvements with only minor changes to existing auditing infrastructure. We consider two ways of representing these marks, both yield risk-limiting comparison audits in the formal sense of Fuller, Harrison, and Russell (IEEE Security & Privacy 2023).
We then define a new type of post-election audit we call a contested audit. These permit each candidate to provide a cast-vote record table advancing their own claim to victory. We prove that these audits offer remarkable sample efficiency, yielding control of risk with a constant number of samples (that is independent of margin). This is a first for an audit with provable soundness. These results are formulated in a game-based security model that specify quantitative soundness and completeness guarantees. These audits provide a means to handle contestation of election results affirmed by conventional RLAs.
△ Less
Submitted 17 June, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
DatUS^2: Data-driven Unsupervised Semantic Segmentation with Pre-trained Self-supervised Vision Transformer
Authors:
Sonal Kumar,
Arijit Sur,
Rashmi Dutta Baruah
Abstract:
Successive proposals of several self-supervised training schemes continue to emerge, taking one step closer to developing a universal foundation model. In this process, the unsupervised downstream tasks are recognized as one of the evaluation methods to validate the quality of visual features learned with a self-supervised training scheme. However, unsupervised dense semantic segmentation has not…
▽ More
Successive proposals of several self-supervised training schemes continue to emerge, taking one step closer to developing a universal foundation model. In this process, the unsupervised downstream tasks are recognized as one of the evaluation methods to validate the quality of visual features learned with a self-supervised training scheme. However, unsupervised dense semantic segmentation has not been explored as a downstream task, which can utilize and evaluate the quality of semantic information introduced in patch-level feature representations during self-supervised training of a vision transformer. Therefore, this paper proposes a novel data-driven approach for unsupervised semantic segmentation (DatUS^2) as a downstream task. DatUS^2 generates semantically consistent and dense pseudo annotate segmentation masks for the unlabeled image dataset without using any visual-prior or synchronized data. We compare these pseudo-annotated segmentation masks with ground truth masks for evaluating recent self-supervised training schemes to learn shared semantic properties at the patch level and discriminative semantic properties at the segment level. Finally, we evaluate existing state-of-the-art self-supervised training schemes with our proposed downstream task, i.e., DatUS^2. Also, the best version of DatUS^2 outperforms the existing state-of-the-art method for the unsupervised dense semantic segmentation task with 15.02% MiOU and 21.47% Pixel accuracy on the SUIM dataset. It also achieves a competitive level of accuracy for a large-scale and complex dataset, i.e., the COCO dataset.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Modeling Homophily in Exponential-Family Random Graph Models for Bipartite Networks
Authors:
Rashmi P. Bomiriya,
Alina R. Kuvelkar,
David R. Hunter,
Steffen Triebel
Abstract:
Homophily, the tendency of individuals who are alike to form ties with one another, is an important concept in the study of social networks. Yet accounting for homophily effects is complicated in the context of bipartite networks where ties connect individuals not with one another but rather with a separate set of nodes, which might also be individuals but which are often an entirely different typ…
▽ More
Homophily, the tendency of individuals who are alike to form ties with one another, is an important concept in the study of social networks. Yet accounting for homophily effects is complicated in the context of bipartite networks where ties connect individuals not with one another but rather with a separate set of nodes, which might also be individuals but which are often an entirely different type of objects. As a result, much work on the effect of homophily in a bipartite network proceeds by first eliminating the bipartite structure, collapsing a two-mode network to a one-mode network and thereby ignoring potentially meaningful structure in the data. We introduce a set of methods to model homophily on bipartite networks without losing information in this way, then we demonstrate that these methods allow for substantively interesting findings in management science not possible using standard techniques. These methods are implemented in the widely-used ergm package for R.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
User Persona Identification and New Service Adaptation Recommendation
Authors:
Narges Tabari,
Sandesh Swamy,
Rashmi Gangadharaiah
Abstract:
Providing a personalized user experience on information dense webpages helps users in reaching their end-goals sooner. We explore an automated approach to identifying user personas by leveraging high dimensional trajectory information from user sessions on webpages. While neural collaborative filtering (NCF) approaches pay little attention to token semantics, our method introduces SessionBERT, a T…
▽ More
Providing a personalized user experience on information dense webpages helps users in reaching their end-goals sooner. We explore an automated approach to identifying user personas by leveraging high dimensional trajectory information from user sessions on webpages. While neural collaborative filtering (NCF) approaches pay little attention to token semantics, our method introduces SessionBERT, a Transformer-backed language model trained from scratch on the masked language modeling (mlm) objective for user trajectories (pages, metadata, billing in a session) aiming to capture semantics within them. Our results show that representations learned through SessionBERT are able to consistently outperform a BERT-base model providing a 3% and 1% relative improvement in F1-score for predicting page links and next services. We leverage SessionBERT and extend it to provide recommendations (top-5) for the next most-relevant services that a user would be likely to use. We achieve a HIT@5 of 58% from our recommendation model.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA
Authors:
Dhruv Agarwal,
Rajarshi Das,
Sopan Khosla,
Rashmi Gangadharaiah
Abstract:
We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at rand…
▽ More
We present BYOKG, a universal question-answering (QA) system that can operate on any knowledge graph (KG), requires no human-annotated training data, and can be ready to use within a day -- attributes that are out-of-scope for current KGQA systems. BYOKG draws inspiration from the remarkable ability of humans to comprehend information present in an unseen KG through exploration -- starting at random nodes, inspecting the labels of adjacent nodes and edges, and combining them with their prior world knowledge. In BYOKG, exploration leverages an LLM-backed symbolic agent that generates a diverse set of query-program exemplars, which are then used to ground a retrieval-augmented reasoning procedure to predict programs for arbitrary questions. BYOKG is effective over both small- and large-scale graphs, showing dramatic gains in QA accuracy over a zero-shot baseline of 27.89 and 58.02 F1 on GrailQA and MetaQA, respectively. On GrailQA, we further show that our unsupervised BYOKG outperforms a supervised in-context learning method, demonstrating the effectiveness of exploration. Lastly, we find that performance of BYOKG reliably improves with continued exploration as well as improvements in the base LLM, notably outperforming a state-of-the-art fine-tuned model by 7.08 F1 on a sub-sampled zero-shot split of GrailQA.
△ Less
Submitted 21 May, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Physics-Informed Data Denoising for Real-Life Sensing Systems
Authors:
Xiyuan Zhang,
Xiaohan Fu,
Diyan Teng,
Chengyu Dong,
Keerthivasan Vijayakumar,
Jiayun Zhang,
Ranak Roy Chowdhury,
Junsheng Han,
Dezhi Hong,
Rashmi Kulkarni,
Jingbo Shang,
Rajesh Gupta
Abstract:
Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically r…
▽ More
Sensors measuring real-life physical processes are ubiquitous in today's interconnected world. These sensors inherently bear noise that often adversely affects performance and reliability of the systems they support. Classic filtering-based approaches introduce strong assumptions on the time or frequency characteristics of sensory measurements, while learning-based denoising approaches typically rely on using ground truth clean data to train a denoising model, which is often challenging or prohibitive to obtain for many real-world applications. We observe that in many scenarios, the relationships between different sensor measurements (e.g., location and acceleration) are analytically described by laws of physics (e.g., second-order differential equation). By incorporating such physics constraints, we can guide the denoising process to improve even in the absence of ground truth data. In light of this, we design a physics-informed denoising model that leverages the inherent algebraic relationships between different measurements governed by the underlying physics. By obviating the need for ground truth clean data, our method offers a practical denoising solution for real-world applications. We conducted experiments in various domains, including inertial navigation, CO2 monitoring, and HVAC control, and achieved state-of-the-art performance compared with existing denoising methods. Our method can denoise data in real time (4ms for a sequence of 1s) for low-cost noisy sensors and produces results that closely align with those from high-precision, high-cost alternatives, leading to an efficient, cost-effective approach for more accurate sensor-based systems.
△ Less
Submitted 12 November, 2023;
originally announced November 2023.
-
Document-Level Supervision for Multi-Aspect Sentiment Analysis Without Fine-grained Labels
Authors:
Kasturi Bhattacharjee,
Rashmi Gangadharaiah
Abstract:
Aspect-based sentiment analysis (ABSA) is a widely studied topic, most often trained through supervision from human annotations of opinionated texts. These fine-grained annotations include identifying aspects towards which a user expresses their sentiment, and their associated polarities (aspect-based sentiments). Such fine-grained annotations can be expensive and often infeasible to obtain in rea…
▽ More
Aspect-based sentiment analysis (ABSA) is a widely studied topic, most often trained through supervision from human annotations of opinionated texts. These fine-grained annotations include identifying aspects towards which a user expresses their sentiment, and their associated polarities (aspect-based sentiments). Such fine-grained annotations can be expensive and often infeasible to obtain in real-world settings. There is, however, an abundance of scenarios where user-generated text contains an overall sentiment, such as a rating of 1-5 in user reviews or user-generated feedback, which may be leveraged for this task. In this paper, we propose a VAE-based topic modeling approach that performs ABSA using document-level supervision and without requiring fine-grained labels for either aspects or sentiments. Our approach allows for the detection of multiple aspects in a document, thereby allowing for the possibility of reasoning about how sentiment expressed through multiple aspects comes together to form an observable overall document-level sentiment. We demonstrate results on two benchmark datasets from two different domains, significantly outperforming a state-of-the-art baseline.
△ Less
Submitted 10 October, 2023;
originally announced October 2023.
-
GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption
Authors:
Kaustubh Shivdikar,
Yuhui Bao,
Rashmi Agrawal,
Michael Shen,
Gilbert Jonatan,
Evelio Mora,
Alexander Ingare,
Neal Livesay,
José L. Abellán,
John Kim,
Ajay Joshi,
David Kaeli
Abstract:
Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computatio…
▽ More
Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computation using plaintext data. This overhead is presently a major barrier to the commercial adoption of FHE.
In this work, we leverage GPUs to accelerate FHE, capitalizing on a well-established GPU ecosystem available in the cloud. We propose GME, which combines three key microarchitectural extensions along with a compile-time optimization to the current AMD CDNA GPU architecture. First, GME integrates a lightweight on-chip compute unit (CU)-side hierarchical interconnect to retain ciphertext in cache across FHE kernels, thus eliminating redundant memory transactions. Second, to tackle compute bottlenecks, GME introduces special MOD-units that provide native custom hardware support for modular reduction operations, one of the most commonly executed sets of operations in FHE. Third, by integrating the MOD-unit with our novel pipelined $64$-bit integer arithmetic cores (WMAC-units), GME further accelerates FHE workloads by $19\%$. Finally, we propose a Locality-Aware Block Scheduler (LABS) that exploits the temporal locality available in FHE primitive blocks. Incorporating these microarchitectural features and compiler optimizations, we create a synergistic approach achieving average speedups of $796\times$, $14.2\times$, and $2.3\times$ over Intel Xeon CPU, NVIDIA V100 GPU, and Xilinx FPGA implementations, respectively.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Trajectory Prediction for Robot Navigation using Flow-Guided Markov Neural Operator
Authors:
Rashmi Bhaskara,
Hrishikesh Viswanath,
Aniket Bera
Abstract:
Predicting pedestrian movements remains a complex and persistent challenge in robot navigation research. We must evaluate several factors to achieve accurate predictions, such as pedestrian interactions, the environment, crowd density, and social and cultural norms. Accurate prediction of pedestrian paths is vital for ensuring safe human-robot interaction, especially in robot navigation. Furthermo…
▽ More
Predicting pedestrian movements remains a complex and persistent challenge in robot navigation research. We must evaluate several factors to achieve accurate predictions, such as pedestrian interactions, the environment, crowd density, and social and cultural norms. Accurate prediction of pedestrian paths is vital for ensuring safe human-robot interaction, especially in robot navigation. Furthermore, this research has potential applications in autonomous vehicles, pedestrian tracking, and human-robot collaboration. Therefore, in this paper, we introduce FlowMNO, an Optical Flow-Integrated Markov Neural Operator designed to capture pedestrian behavior across diverse scenarios. Our paper models trajectory prediction as a Markovian process, where future pedestrian coordinates depend solely on the current state. This problem formulation eliminates the need to store previous states. We conducted experiments using standard benchmark datasets like ETH, HOTEL, ZARA1, ZARA2, UCY, and RGB-D pedestrian datasets. Our study demonstrates that FlowMNO outperforms some of the state-of-the-art deep learning methods like LSTM, GAN, and CNN-based approaches, by approximately 86.46% when predicting pedestrian trajectories. Thus, we show that FlowMNO can seamlessly integrate into robot navigation systems, enhancing their ability to navigate crowded areas smoothly.
△ Less
Submitted 18 September, 2023; v1 submitted 16 September, 2023;
originally announced September 2023.
-
Studying Accuracy of Machine Learning Models Trained on Lab Lifting Data in Solving Real-World Problems Using Wearable Sensors for Workplace Safety
Authors:
Joseph Bertrand,
Nick Griffey,
Ming-Lun Lu,
Rashmi Jha
Abstract:
Porting ML models trained on lab data to real-world situations has long been a challenge. This paper discusses porting a lab-trained lifting identification model to the real-world. With performance much lower than on training data, we explored causes of the failure and proposed four potential solutions to increase model performance
Porting ML models trained on lab data to real-world situations has long been a challenge. This paper discusses porting a lab-trained lifting identification model to the real-world. With performance much lower than on training data, we explored causes of the failure and proposed four potential solutions to increase model performance
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
A Flexible Architecture for Broadcast Broadband Convergence in Beyond 5G
Authors:
Rashmi Yadav,
Rashmi Kamran,
Pranav Jha,
Abhay Karandikar
Abstract:
There has been an exponential increase in the usage of multimedia services in mobile networks in recent years. To address this accelerating data demand, mobile networks are experiencing a subtle transformation in their architecture. One of the changes in this direction is the support of Multicast/Broadcast Service (MBS) in the Third Generation Partnership Project (3GPP) Fifth Generation (5G) netwo…
▽ More
There has been an exponential increase in the usage of multimedia services in mobile networks in recent years. To address this accelerating data demand, mobile networks are experiencing a subtle transformation in their architecture. One of the changes in this direction is the support of Multicast/Broadcast Service (MBS) in the Third Generation Partnership Project (3GPP) Fifth Generation (5G) network. The MBS has been introduced to enhance resource utilization and user experience in 3GPP 5G networks. However, there are certain limitations in the 3GPP 5G MBS architecture, such as the selection of the delivery method (unicast or broadcast) by the core network (may result in sub-optimal radio resource utilization) and no provision for converging non-3GPP broadcast technologies (like digital terrestrial television) with cellular (3GPP 5G) broadband. In this context, we propose a new architecture for the convergence of cellular broadband and non-3GPP broadcast networks. A novelty of the architecture is that it treats signalling exchange with User Equipment (UE) as data (service) which results in improved scalability of mobile networks. The architecture supports enhanced flexibility in choosing a delivery method (3GPP 5G unicast, 3GPP 5G broadcast, or non-3GPP broadcast) for user data. We evaluate the performance of the proposed architecture using process algebra-based simulations, demonstrating a significant reduction in the number of signalling messages exchanged between the UE and the network for MBS session establishment as compared to the 3GPP 5G network.
△ Less
Submitted 2 January, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
ChatSim: Underwater Simulation with Natural Language Prompting
Authors:
Aadi Palnitkar,
Rashmi Kapu,
Xiaomin Lin,
Cheng Liu,
Nare Karapetyan,
Yiannis Aloimonos
Abstract:
Robots are becoming an essential part of many operations including marine exploration or environmental monitoring. However, the underwater environment presents many challenges, including high pressure, limited visibility, and harsh conditions that can damage equipment. Real-world experimentation can be expensive and difficult to execute. Therefore, it is essential to simulate the performance of un…
▽ More
Robots are becoming an essential part of many operations including marine exploration or environmental monitoring. However, the underwater environment presents many challenges, including high pressure, limited visibility, and harsh conditions that can damage equipment. Real-world experimentation can be expensive and difficult to execute. Therefore, it is essential to simulate the performance of underwater robots in comparable environments to ensure their optimal functionality within practical real-world contexts.OysterSim generates photo-realistic images and segmentation masks of objects in marine environments, providing valuable training data for underwater computer vision applications. By integrating ChatGPT into underwater simulations, users can convey their thoughts effortlessly and intuitively create desired underwater environments without intricate coding. \invis{Moreover, researchers can realize substantial time and cost savings by evaluating their algorithms across diverse underwater conditions in the simulation.} The objective of ChatSim is to integrate Large Language Models (LLM) with a simulation environment~(OysterSim), enabling direct control of the simulated environment via natural language input. This advancement can greatly enhance the capabilities of underwater simulation, with far-reaching benefits for marine exploration and broader scientific research endeavors.
△ Less
Submitted 9 August, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
An Architecture for Control Plane Slicing in Beyond 5G Networks
Authors:
Rashmi Yadav,
Rashmi Kamran,
Pranav Jha,
Abhay Karandikar
Abstract:
To accommodate various use cases with differing characteristics, the Fifth Generation (5G) mobile communications system intends to utilize network slicing. Network slicing enables the creation of multiple logical networks over a shared physical network infrastructure. While the problems such as resource allocation for multiple slices in mobile networks have been explored in considerable detail in…
▽ More
To accommodate various use cases with differing characteristics, the Fifth Generation (5G) mobile communications system intends to utilize network slicing. Network slicing enables the creation of multiple logical networks over a shared physical network infrastructure. While the problems such as resource allocation for multiple slices in mobile networks have been explored in considerable detail in the existing literature, the suitability of the existing mobile network architecture to support network slicing has not been analysed adequately. We think the existing 5G System (5GS) architecture suffers from certain limitations, such as a lack of slice isolation in its control plane. This work focuses on the future evolution of the existing 5GS architecture from a slicing perspective, especially that of its control plane, addressing some of the limitations of the existing 5GS architecture. We propose a new network architecture which enables efficient slicing in beyond 5G networks. The proposed architecture results in enhanced modularity and scalability of the control plane in sliced mobile networks. In addition, it also brings slice isolation to the control plane, which is not feasible in the existing 5G system. We also present a performance evaluation that confirms the improved performance and scalability of the proposed system viz a viz the existing 5G system.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Applying SDN to Mobile Networks: A New Perspective for 6G Architecture
Authors:
Rashmi Yadav,
Rashmi Kamran,
Pranav Jha,
Abhay Karandikar
Abstract:
The upcoming Sixth Generation (6G) mobile communications system envisions supporting a variety of use cases with differing characteristics, e.g., very low to extremely high data rates, diverse latency needs, ultra massive connectivity, sustainable communications, ultra-wide coverage etc. To accommodate these diverse use cases, the 6G system architecture needs to be scalable, modular, and flexible;…
▽ More
The upcoming Sixth Generation (6G) mobile communications system envisions supporting a variety of use cases with differing characteristics, e.g., very low to extremely high data rates, diverse latency needs, ultra massive connectivity, sustainable communications, ultra-wide coverage etc. To accommodate these diverse use cases, the 6G system architecture needs to be scalable, modular, and flexible; both in its user plane and the control plane. In this paper, we identify some limitations of the existing Fifth Generation System (5GS) architecture, especially that of its control plane. Further, we propose a novel architecture for the 6G System (6GS) employing Software Defined Networking (SDN) technology to address these limitations of the control plane. The control plane in existing 5GS supports two different categories of functionalities handling end user signalling (e.g., user registration, authentication) and control of user plane functions. We propose to move the end-user signalling functionality out of the mobile network control plane and treat it as user service, i.e., as payload or data. This proposal results in an evolved service-driven architecture for mobile networks bringing increased simplicity, modularity, scalability, flexibility and security to its control plane. The proposed architecture can also support service specific signalling support, if needed, making it better suited for diverse 6GS use cases. To demonstrate the advantages of the proposed architecture, we also compare its performance with the 5GS using a process algebra-based simulation tool.
△ Less
Submitted 12 July, 2024; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Leveraging Residue Number System for Designing High-Precision Analog Deep Neural Network Accelerators
Authors:
Cansu Demirkiran,
Rashmi Agrawal,
Vijay Janapa Reddi,
Darius Bunandar,
Ajay Joshi
Abstract:
Achieving high accuracy, while maintaining good energy efficiency, in analog DNN accelerators is challenging as high-precision data converters are expensive. In this paper, we overcome this challenge by using the residue number system (RNS) to compose high-precision operations from multiple low-precision operations. This enables us to eliminate the information loss caused by the limited precision…
▽ More
Achieving high accuracy, while maintaining good energy efficiency, in analog DNN accelerators is challenging as high-precision data converters are expensive. In this paper, we overcome this challenge by using the residue number system (RNS) to compose high-precision operations from multiple low-precision operations. This enables us to eliminate the information loss caused by the limited precision of the ADCs. Our study shows that RNS can achieve 99% FP32 accuracy for state-of-the-art DNN inference using data converters with only $6$-bit precision. We propose using redundant RNS to achieve a fault-tolerant analog accelerator. In addition, we show that RNS can reduce the energy consumption of the data converters within an analog accelerator by several orders of magnitude compared to a regular fixed-point approach.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
From Local to Global: Navigating Linguistic Diversity in the African Context
Authors:
Rashmi Margani,
Nelson Ndugu
Abstract:
The focus is on critical problems in NLP related to linguistic diversity and variation across the African continent, specifically with regards to African local dialects and Arabic dialects that have received little attention. We evaluated our various approaches, demonstrating their effectiveness while highlighting the potential impact of the proposed approach on businesses seeking to improve custo…
▽ More
The focus is on critical problems in NLP related to linguistic diversity and variation across the African continent, specifically with regards to African local dialects and Arabic dialects that have received little attention. We evaluated our various approaches, demonstrating their effectiveness while highlighting the potential impact of the proposed approach on businesses seeking to improve customer experience and product development in African local dialects. The idea of using the model as a teaching tool for product-based instruction is interesting, as it could potentially stimulate interest in learners and trigger techno entrepreneurship. Overall, our modified approach offers a promising analysis of the challenges of dealing with African local dialects. Particularly Arabic dialects, which could have a significant impact on businesses seeking to improve customer experience and product development.
△ Less
Submitted 2 May, 2023;
originally announced May 2023.
-
AutoCure: Automated Tabular Data Curation Technique for ML Pipelines
Authors:
Mohamed Abdelaal,
Rashmi Koparde,
Harald Schoening
Abstract:
Machine learning algorithms have become increasingly prevalent in multiple domains, such as autonomous driving, healthcare, and finance. In such domains, data preparation remains a significant challenge in developing accurate models, requiring significant expertise and time investment to search the huge search space of well-suited data curation and transformation tools. To address this challenge,…
▽ More
Machine learning algorithms have become increasingly prevalent in multiple domains, such as autonomous driving, healthcare, and finance. In such domains, data preparation remains a significant challenge in developing accurate models, requiring significant expertise and time investment to search the huge search space of well-suited data curation and transformation tools. To address this challenge, we present AutoCure, a novel and configuration-free data curation pipeline that improves the quality of tabular data. Unlike traditional data curation methods, AutoCure synthetically enhances the density of the clean data fraction through an adaptive ensemble-based error detection method and a data augmentation module. In practice, AutoCure can be integrated with open source tools, e.g., Auto-sklearn, H2O, and TPOT, to promote the democratization of machine learning. As a proof of concept, we provide a comparative evaluation of AutoCure against 28 combinations of traditional data curation tools, demonstrating superior performance and predictive accuracy without user intervention. Our evaluation shows that AutoCure is an effective approach to automating data preparation and improving the accuracy of machine learning models.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model
Authors:
Rashmi Ranjan Bhuyan,
Adel Javanmard,
Sungchul Kim,
Gourab Mukherjee,
Ryan A. Rossi,
Tong Yu,
Handong Zhao
Abstract:
We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways…
▽ More
We consider dynamic pricing strategies in a streamed longitudinal data set-up where the objective is to maximize, over time, the cumulative profit across a large number of customer segments. We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time. Building on the well-known finding that consumers sharing similar characteristics act in similar ways, we consider a global shrinkage structure, which assumes that the consumers' preferences across the different segments can be well approximated by a spatial autoregressive (SAR) model. In such a streamed longitudinal set-up, we measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. We propose a pricing policy based on penalized stochastic gradient descent (PSGD) and explicitly characterize its regret as functions of time, the temporal variability in the model parameters as well as the strength of the auto-correlation network structure spanning the varied customer segments. Our regret analysis results not only demonstrate asymptotic optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information as policies based on unshrunken models are highly sub-optimal in the aforementioned set-up. We conduct simulation experiments across a wide range of regimes as well as real-world networks based studies and report encouraging performance for our proposed method.
△ Less
Submitted 13 October, 2023; v1 submitted 27 March, 2023;
originally announced March 2023.
-
Generalized Distance Metric for Various DHT Routing Algorithms in Peer-to-Peer Networks
Authors:
Rashmi Kushwaha,
Shreyas Kulkarni,
Yatindra Nath Singh
Abstract:
We present a generalized distance metric that can be used to implement routing strategies and identify routing table entries to reach the root node for a given key, in a DHT (Distributed Hash Table) network based on either Chord, Kademlia, Tapestry, or Pastry. The generalization shows that all the above four DHT algorithms are in fact, the same algorithm but with different parameters in distance r…
▽ More
We present a generalized distance metric that can be used to implement routing strategies and identify routing table entries to reach the root node for a given key, in a DHT (Distributed Hash Table) network based on either Chord, Kademlia, Tapestry, or Pastry. The generalization shows that all the above four DHT algorithms are in fact, the same algorithm but with different parameters in distance representation. We also proposes that nodes can have routing tables of varying sizes based on their memory capabilities but with the fact that each node must have at least two entries, one for the node closest from it, and the other for the node from whom it is closest in each ring components for all the algorithms. Messages will always reach the correct root nodes by following the above rule. We also further observe that in any network, if the distance metric to define the root node in the DHT is same at all the nodes, then the root node for a key will also be the same, irrespective of the size of the routing table at different nodes.
△ Less
Submitted 28 February, 2024; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Machine learning based biomedical image processing for echocardiographic images
Authors:
Ayesha Heena,
Nagashettappa Biradar,
Najmuddin M. Maroof,
Surbhi Bhatia,
Rashmi Agarwal,
Kanta Prasad
Abstract:
The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suita…
▽ More
The popularity of Artificial intelligence and machine learning have prompted researchers to use it in the recent researches. The proposed method uses K-Nearest Neighbor (KNN) algorithm for segmentation of medical images, extracting of image features for analysis by classifying the data based on the neural networks. Classification of the images in medical imaging is very important, KNN is one suitable algorithm which is simple, conceptual and computational, which provides very good accuracy in results. KNN algorithm is a unique user-friendly approach with wide range of applications in machine learning algorithms which are majorly used for the various image processing applications including classification, segmentation and regression issues of the image processing. The proposed system uses gray level co-occurrence matrix features. The trained neural network has been tested successfully on a group of echocardiographic images, errors were compared using regression plot. The results of the algorithm are tested using various quantitative as well as qualitative metrics and proven to exhibit better performance in terms of both quantitative and qualitative metrics in terms of current state-of-the-art methods in the related area. To compare the performance of trained neural network the regression analysis performed showed a good correlation.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
SG-LSTM: Social Group LSTM for Robot Navigation Through Dense Crowds
Authors:
Rashmi Bhaskara,
Maurice Chiu,
Aniket Bera
Abstract:
With the increasing availability and affordability of personal robots, they will no longer be confined to large corporate warehouses or factories but will instead be expected to operate in less controlled environments alongside larger groups of people. In addition to ensuring safety and efficiency, it is crucial to minimize any negative psychological impact robots may have on humans and follow unw…
▽ More
With the increasing availability and affordability of personal robots, they will no longer be confined to large corporate warehouses or factories but will instead be expected to operate in less controlled environments alongside larger groups of people. In addition to ensuring safety and efficiency, it is crucial to minimize any negative psychological impact robots may have on humans and follow unwritten social norms in these situations. Our research aims to develop a model that can predict the movements of pedestrians and perceptually-social groups in crowded environments. We introduce a new Social Group Long Short-term Memory (SG-LSTM) model that models human groups and interactions in dense environments using a socially-aware LSTM to produce more accurate trajectory predictions. Our approach enables navigation algorithms to calculate collision-free paths faster and more accurately in crowded environments. Additionally, we also release a large video dataset with labeled pedestrian groups for the broader social navigation community. We show comparisons with different metrics on different datasets (ETH, Hotel, MOT15) and different prediction approaches (LIN, LSTM, O-LSTM, S-LSTM) as well as runtime performance.
△ Less
Submitted 6 August, 2023; v1 submitted 7 March, 2023;
originally announced March 2023.
-
RISE: RISC-V SoC for En/decryption Acceleration on the Edge for Homomorphic Encryption
Authors:
Zahra Azad,
Guowei Yang,
Rashmi Agrawal,
Daniel Petrisko,
Michael Taylor,
Ajay Joshi
Abstract:
Today edge devices commonly connect to the cloud to use its storage and compute capabilities. This leads to security and privacy concerns about user data. Homomorphic Encryption (HE) is a promising solution to address the data privacy problem as it allows arbitrarily complex computations on encrypted data without ever needing to decrypt it. While there has been a lot of work on accelerating HE com…
▽ More
Today edge devices commonly connect to the cloud to use its storage and compute capabilities. This leads to security and privacy concerns about user data. Homomorphic Encryption (HE) is a promising solution to address the data privacy problem as it allows arbitrarily complex computations on encrypted data without ever needing to decrypt it. While there has been a lot of work on accelerating HE computations in the cloud, little attention has been paid to the message-to-ciphertext and ciphertext-to-message conversion operations on the edge. In this work, we profile the edge-side conversion operations, and our analysis shows that during conversion error sampling, encryption, and decryption operations are the bottlenecks. To overcome these bottlenecks, we present RISE, an area and energy-efficient RISC-V SoC. RISE leverages an efficient and lightweight pseudo-random number generator core and combines it with fast sampling techniques to accelerate the error sampling operations. To accelerate the encryption and decryption operations, RISE uses scalable, data-level parallelism to implement the number theoretic transform operation, the main bottleneck within the encryption and decryption operations. In addition, RISE saves area by implementing a unified en/decryption datapath, and efficiently exploits techniques like memory reuse and data reordering to utilize a minimal amount of on-chip memory. We evaluate RISE using a complete RTL design containing a RISC-V processor interfaced with our accelerator. Our analysis reveals that for message-to-ciphertext conversion and ciphertext-to-message conversion, using RISE leads up to 6191.19X and 2481.44X more energy-efficient solution, respectively, than when using just the RISC-V processor.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
Contextual Dynamic Prompting for Response Generation in Task-oriented Dialog Systems
Authors:
Sandesh Swamy,
Narges Tabari,
Chacha Chen,
Rashmi Gangadharaiah
Abstract:
Response generation is one of the critical components in task-oriented dialog systems. Existing studies have shown that large pre-trained language models can be adapted to this task. The typical paradigm of adapting such extremely large language models would be by fine-tuning on the downstream tasks which is not only time-consuming but also involves significant resources and access to fine-tuning…
▽ More
Response generation is one of the critical components in task-oriented dialog systems. Existing studies have shown that large pre-trained language models can be adapted to this task. The typical paradigm of adapting such extremely large language models would be by fine-tuning on the downstream tasks which is not only time-consuming but also involves significant resources and access to fine-tuning data. Prompting (Schick and Schütze, 2020) has been an alternative to fine-tuning in many NLP tasks. In our work, we explore the idea of using prompting for response generation in task-oriented dialog systems. Specifically, we propose an approach that performs contextual dynamic prompting where the prompts are learnt from dialog contexts. We aim to distill useful prompting signals from the dialog context. On experiments with MultiWOZ 2.2 dataset (Zang et al., 2020), we show that contextual dynamic prompts improve response generation in terms of combined score (Mehri et al., 2019) by 3 absolute points, and a massive 20 points when dialog states are incorporated. Furthermore, human annotation on these conversations found that agents which incorporate context were preferred over agents with vanilla prefix-tuning.
△ Less
Submitted 10 February, 2023; v1 submitted 30 January, 2023;
originally announced January 2023.
-
Explicit Context Integrated Recurrent Neural Network for Sensor Data Applications
Authors:
Rashmi Dutta Baruah,
Mario Muñoz Organero
Abstract:
The development and progress in sensor, communication and computing technologies have led to data rich environments. In such environments, data can easily be acquired not only from the monitored entities but also from the surroundings where the entity is operating. The additional data that are available from the problem domain, which cannot be used independently for learning models, constitute con…
▽ More
The development and progress in sensor, communication and computing technologies have led to data rich environments. In such environments, data can easily be acquired not only from the monitored entities but also from the surroundings where the entity is operating. The additional data that are available from the problem domain, which cannot be used independently for learning models, constitute context. Such context, if taken into account while learning, can potentially improve the performance of predictive models. Typically, the data from various sensors are present in the form of time series. Recurrent Neural Networks (RNNs) are preferred for such data as it can inherently handle temporal context. However, the conventional RNN models such as Elman RNN, Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) in their present form do not provide any mechanism to integrate explicit contexts. In this paper, we propose a Context Integrated RNN (CiRNN) that enables integrating explicit contexts represented in the form of contextual features. In CiRNN, the network weights are influenced by contextual features in such a way that the primary input features which are more relevant to a given context are given more importance. To show the efficacy of CiRNN, we selected an application domain, engine health prognostics, which captures data from various sensors and where contextual information is available. We used the NASA Turbofan Engine Degradation Simulation dataset for estimating Remaining Useful Life (RUL) as it provides contextual information. We compared CiRNN with baseline models as well as the state-of-the-art methods. The experimental results show an improvement of 39% and 87% respectively, over state-of-the art models, when performance is measured with RMSE and score from an asymmetric scoring function. The latter measure is specific to the task of RUL estimation.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
Privacy Adhering Machine Un-learning in NLP
Authors:
Vinayshekhar Bannihatti Kumar,
Rashmi Gangadharaiah,
Dan Roth
Abstract:
Regulations introduced by General Data Protection Regulation (GDPR) in the EU or California Consumer Privacy Act (CCPA) in the US have included provisions on the \textit{right to be forgotten} that mandates industry applications to remove data related to an individual from their systems. In several real world industry applications that use Machine Learning to build models on user data, such mandat…
▽ More
Regulations introduced by General Data Protection Regulation (GDPR) in the EU or California Consumer Privacy Act (CCPA) in the US have included provisions on the \textit{right to be forgotten} that mandates industry applications to remove data related to an individual from their systems. In several real world industry applications that use Machine Learning to build models on user data, such mandates require significant effort both in terms of data cleansing as well as model retraining while ensuring the models do not deteriorate in prediction quality due to removal of data. As a result, continuous removal of data and model retraining steps do not scale if these applications receive such requests at a very high frequency. Recently, a few researchers proposed the idea of \textit{Machine Unlearning} to tackle this challenge. Despite the significant importance of this task, the area of Machine Unlearning is under-explored in Natural Language Processing (NLP) tasks. In this paper, we explore the Unlearning framework on various GLUE tasks \cite{Wang:18}, such as, QQP, SST and MNLI. We propose computationally efficient approaches (SISA-FC and SISA-A) to perform \textit{guaranteed} Unlearning that provides significant reduction in terms of both memory (90-95\%), time (100x) and space consumption (99\%) in comparison to the baselines while keeping model performance constant.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Zero-Shot Learning for Joint Intent and Slot Labeling
Authors:
Rashmi Gangadharaiah,
Balakrishnan Narayanaswamy
Abstract:
It is expensive and difficult to obtain the large number of sentence-level intent and token-level slot label annotations required to train neural network (NN)-based Natural Language Understanding (NLU) components of task-oriented dialog systems, especially for the many real world tasks that have a large and growing number of intents and slot types. While zero shot learning approaches that require…
▽ More
It is expensive and difficult to obtain the large number of sentence-level intent and token-level slot label annotations required to train neural network (NN)-based Natural Language Understanding (NLU) components of task-oriented dialog systems, especially for the many real world tasks that have a large and growing number of intents and slot types. While zero shot learning approaches that require no labeled examples -- only features and auxiliary information -- have been proposed only for slot labeling, we show that one can profitably perform joint zero-shot intent classification and slot labeling. We demonstrate the value of capturing dependencies between intents and slots, and between different slots in an utterance in the zero shot setting. We describe NN architectures that translate between word and sentence embedding spaces, and demonstrate that these modifications are required to enable zero shot learning for this task. We show a substantial improvement over strong baselines and explain the intuition behind each architectural modification through visualizations and ablation studies.
△ Less
Submitted 28 November, 2022;
originally announced December 2022.
-
AdaFNIO: Adaptive Fourier Neural Interpolation Operator for video frame interpolation
Authors:
Hrishikesh Viswanath,
Md Ashiqur Rahman,
Rashmi Bhaskara,
Aniket Bera
Abstract:
We present, AdaFNIO - Adaptive Fourier Neural Interpolation Operator, a neural operator-based architecture to perform video frame interpolation. Current deep learning based methods rely on local convolutions for feature learning and suffer from not being scale-invariant, thus requiring training data to be augmented through random flipping and re-scaling. On the other hand, AdaFNIO, learns the feat…
▽ More
We present, AdaFNIO - Adaptive Fourier Neural Interpolation Operator, a neural operator-based architecture to perform video frame interpolation. Current deep learning based methods rely on local convolutions for feature learning and suffer from not being scale-invariant, thus requiring training data to be augmented through random flipping and re-scaling. On the other hand, AdaFNIO, learns the features in the frames, independent of input resolution, through token mixing and global convolution in the Fourier space or the spectral domain by using Fast Fourier Transform (FFT). We show that AdaFNIO can produce visually smooth and accurate results. To evaluate the visual quality of our interpolated frames, we calculate the structural similarity index (SSIM) and Peak Signal to Noise Ratio (PSNR) between the generated frame and the ground truth frame. We provide the quantitative performance of our model on Vimeo-90K dataset, DAVIS, UCF101 and DISFA+ dataset.
△ Less
Submitted 8 March, 2023; v1 submitted 19 November, 2022;
originally announced November 2022.
-
A review of TinyML
Authors:
Harsha Yelchuri,
Rashmi R
Abstract:
In this current technological world, the application of machine learning is becoming ubiquitous. Incorporating machine learning algorithms on extremely low-power and inexpensive embedded devices at the edge level is now possible due to the combination of the Internet of Things (IoT) and edge computing. To estimate an outcome, traditional machine learning demands vast amounts of resources. The Tiny…
▽ More
In this current technological world, the application of machine learning is becoming ubiquitous. Incorporating machine learning algorithms on extremely low-power and inexpensive embedded devices at the edge level is now possible due to the combination of the Internet of Things (IoT) and edge computing. To estimate an outcome, traditional machine learning demands vast amounts of resources. The TinyML concept for embedded machine learning attempts to push such diversity from usual high-end approaches to low-end applications. TinyML is a rapidly expanding interdisciplinary topic at the convergence of machine learning, software, and hardware centered on deploying deep neural network models on embedded (micro-controller-driven) systems. TinyML will pave the way for novel edge-level services and applications that survive on distributed edge inferring and independent decision-making rather than server computation. In this paper, we explore TinyML's methodology, how TinyML can benefit a few specific industrial fields, its obstacles, and its future scope.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.